NPRG036 - Data Formats - Homework 3 assignment

What
In this homework, you will create data schemas, data samples and queries using hierarchical data models and query languages including transformations of the hierarchical data to RDF. You should use the same data instances as in HW2, this time in other data formats.
  1. Fix the previous homework based on the tutor's notes. The XSLT transformation and JSON-LD mapping in this homework should target the fixed HW2 version (see below).
  2. For representation of your data in hierarchical data formats, you will first need to create one or more hierarchical diagrams corresponding to the conceptual one, showing how your data will be structured into hierarchies. Each hierarchical diagram will use directed associations, showing the nesting relation, and will have a root class with no incoming associations. It might be necessary to split the data into multiple hierarchies to avoid some redundancies.
  3. Example of a conceptual model:
  4. Example of a hierarchical model:
  5. For each hierarchical model create a corresponding XML Schema, enforcing proper datatypes.
  6. Represent the data in XML files valid against the created XML Schemas. Utilize the xml:lang attribute to denote the natural language of text values (HINT)
  7. Create a set of non-trivial XPath queries to query the XML data.
  8. Create a non-trivial XSLT transformation producing HTML representation of a reasonable subset of your data.
  9. Create XSLT transformations producing RDF Turtle representation of your data. This is called a "lifting transformation" - lifting the data to a semantically more precise representation. The resulting RDF data representation should be identical to the one from Homework 2.
  10. For each hierarchical model create a corresponding JSON Schema, enforcing proper datatypes.
  11. Represent the data in JSON files valid against the created JSON Schemas.
  12. Create a JSON-LD context mapping the JSON representations to RDF. The resulting RDF data representation should be identical to the one from Homework 2. This might require changing or amending the JSON representation and the JSON Schemas. Use the JSON-LD playground to view the RDF N-Quads representation. Use the Apache Jena riot command-line tool transform the result into RDF Turtle.
  13. Create a set of non-trivial jq queries to query the JSON data.
Quantitative requirements
  1. At least 3 instances of each class. In case of inheritance hierarchies, one of each children is enough.
  2. Every attribute used at least once.
  3. At least 3 instances of each association.
  4. At least 4 non-trivial XPath queries
  5. At least 4 non-trivial jq queries
How
  1. Replace the HW2 file with a fixed one in the HW2 column.
  2. To the HW3 column, upload a zipped file named NPRG036-HW3-<groupID>.zip, e.g. NPRG036-HW3-T1G4.zip.
  3. Zip file will contain folder 3, containing:
    1. Folder model containing:
      1. Files hierarchy-1.svg, hierarchy-2.svg, ... with the hierarchical models
    2. Folder xml containing:
      1. Folder schemas containing:
        1. Files schema-1.xsd, schema-2.xsd, ... with XML Schema schemas
      2. Folder data containing:
        1. Files data-1.xml, data-2.xml, ... with data valid against the respective schemas, linked to them via the appropriate attributes
      3. Folder queries containing:
        1. Files query-1-1.xpath, query-1-2.xpath, query-2-1.xpath, ... with executable queries, with their meaning explained in XPath comments. The first number identifies the data file the query should be run against. The second number identifies the queries on one file.
      4. Folder xslt-html containing:
        1. File toHtml.xslt, transforming one of the XML data files to HTML
      5. Folder xslt-rdf containing:
        1. Files toRdf-1.xslt, toRdf-2.xslt, ... transforming the individual XML data files to RDF Turtle
      6. Folder rdf containing:
        1. RDF Turtle files data-1.ttl, data-2.ttl, ... with data resulting from the toRdf-* XSLT transformations of the XML data files.
    3. Folder json containing:
      1. Folder schemas containing:
        1. Files schema-1.json, schema-2.json, ... with JSON Schema schemas
      2. Folder data containing:
        1. JSON-LD files data-1.jsonld, data-2.jsonld, ... with data valid against the respective schemas and interpretable as RDF. The JSON-LD context should be included in the data files.
      3. Folder rdf containing:
        1. RDF Turtle files data-1.ttl, data-2.ttl, ... with data resulting from the JSON-LD mapping of the JSON data files.
      4. Folder queries containing:
        1. Files query-1-1.jq, query-1-2.jq, query-2-1.jq, ... with executable queries. The first number identifies the data file the query should be run against. The second number identifies the queries on one file.
        2. File readme.txt in UTF-8 explaining the meaning of the individual queries.

Frequently Asked Questions (FAQ)

What is a trivial query?
Listing of entities of a certain type, optionally filtered by a certain value.
Counting of entities of a certain type, optionally filtered by a certain value.

Common errors

RDF triples generated from XML using XSLT or by translating JSON-LD do not match those in HW2
The point of the "lifting transformation" part of the homework is to practice transformation of data from XML to RDF and from JSON to RDF. The resulting triples should, therefore, use the same IRIs, predicates and classes, and the literals should have the same data types and language tags as in HW2.
Overusing for-each in XSLT instead of using XSL templates
When writing XSLT, try to utilize the natural flow of the XSLT processor, which traverses the XML document tree and applies templates. Writing XSLT as one big template full of xsl:for-each cycles instead is a bad practice.
xsd:bool, xsd:int, xsd:float datatypes
xsd:bool does not exist. It is xsd:boolean. xsd:int exists, but it is a more specific data type than xsd:integer, which is the one used everywhere on the web. xsd:float exists, but the one usually used is xsd:double.