NPRG036 - Data Formats

Basic information - winter 2024

  1. The lectures and tutorials are on-site, in person. Slides and videos in English and Czech from last years are provided on this webpage. The lecture this year and one of the tutorials will be taught in English.
  2. Homework will be done in groups and will have 4 parts with corresponding deadlines.
  3. All 4 parts of the homework need to be turned in before the individual deadlines in order to proceed to the final exam.

Lectures - Wednesdays 12:20 in S3

  1. 2024-10-03: Data formats introduction: Google Slides, YouTube (English), YouTube (Czech)
  2. 2024-10-10: Graph data formats - RDF, RDF Schema, Linked Data, Open World Assumption: Google Slides, YouTube (English), YouTube (Czech)
  3. 2024-10-17: Graph data formats - SPARQL: Google Slides, YouTube (English), YouTube (Czech)
  4. 2024-10-24: Graph data formats - Basic vocabularies, Wikidata: Google Slides, YouTube (English), YouTube (Czech)
  5. 2024-10-31: Graph data formats - Labeled property graph model, Cypher, RDF-star: Google Slides, YouTube (English), YouTube (Czech)
  6. 2024-11-07: Hierarchical data formats - XML, XML Schema: Google Slides, YouTube (English), YouTube (Czech)
  7. 2024-11-14: No lecture
  8. 2024-11-21: Hierarchical data formats - XPath, XSLT: Google Slides, YouTube (English), YouTube (Czech)
  9. 2024-11-28: Hierarchical data formats - JSON, JSON Schema, JSON-LD: Google Slides, YouTube (English), YouTube (Czech)
  10. 2024-12-05: Relational data formats - SQL dump, CSV, CSV on the Web: Google Slides, YouTube (English), YouTube (Czech)
  11. 2024-12-12: Formats for geodata by guest speaker Michal Med: PDF, YouTube
  12. 2024-12-19: Key-value, configuration formats - .properties, INI, TOML, YAML: Google Slides, YouTube (English), YouTube (Czech), Formats for text documents: Google Slides, YouTube (English), YouTube (Czech)
  13. 2025-01-09: Multimedia formats - images, video, audio, containers, print formats: Google Slides, YouTube (English), YouTube (Czech), Print formats on YouTube (Czech)

Tutorials

In this section, the links to tutorials with examples are available. There are three instances of tutorials per week. The tutorials are split into (R) Recommended, where we go through what you need for the homework, and (O) Optional, which are shorter and you can practice them at home, and therefore come to the tutorial only if you need to consult something (the homework).

  1. T1: Thursdays 14:00, S4 - bring your own laptop!, Czech
  2. T2: Thursdays 15:40, SW2, English
  3. T3: Wednesdays 12:20, SU2, Czech

Schedule and slides

The slides contain assignments to be practiced during the tutorial. In case of problems consult during the tutorial. Tutorials are numbered from the first one after first lecture, i.e. T1 and T2 have Tutorial 1 in the first week, T3 has Tutorial 1 in the second week. Exact dates for the groups are available in the tooltip when you hover over the tutorial number below.

  1. Tutorial 1 (R): Conceptual Modeling
  2. Tutorial 2 (R): RDF
  3. Tutorial 3 (R): SPARQL
  4. Tutorial 4 (O): Wikidata
  5. Tutorial 5 (R): LPG & Cypher
  6. Tutorial 6 (R): XML & XML Schema, No tutorial for T3 on 2024-11-13
  7. Tutorial 7 (R): XPath & XSLT, No tutorial for T1 and T2 on 2024-11-14
  8. Tutorial 8 (R): JSON, jq, JSON Schema, JSON-LD
  9. Tutorial 9 (R): CSV, CSV on the Web
  10. Tutorial 10 (O): Geodata - GeoJSON, WKT, CRS, QGIS
  11. Tutorial 11 (O): Key-value formats - TOML, YAML, Formats for text documents
  12. Tutorial 12 (O): Multimedia formats

Homework

Homework will be done in groups and will have 4 parts. All 4 parts of homework need to be turned in using the SIS Study group roster module before the individual deadlines in order to proceed to the final exam. The tutor's comments to the homework solutions need to be addressed when the next part is turned in. Before turning in a homework part, double-check the assignment and common errors and make sure you satisfy all requirements.

Homework part 1: Conceptual model

Submission deadline
Before 3rd tutorials.
  1. 2024-10-17T14:00:00 for T1
  2. 2024-10-17T15:40:00 for T2
  3. 2024-10-23T12:20:00 for T3
Assignment
See the homework 1 assignment.

Homework part 2: Graph models

Submission deadline
Before 6th tutorials for T1 and T2, before 2024-11-13T12:20:00 for T3
  1. 2024-11-07T14:00:00 for T1
  2. 2024-11-07T15:40:00 for T2
  3. 2024-11-13T12:20:00 for T3
Assignment
See the homework 2 assignment.

Homework part 3: Hierarchical models

Submission deadline
Before 9th tutorials.
  1. 2024-12-05T14:00:00 for T1
  2. 2024-12-05T15:40:00 for T2
  3. 2024-12-11T12:20:00 for T3
Assignment
See the homework 3 assignment.

Homework part 4: Relational model

Submission deadline
Before 10th tutorials.
  1. 2024-12-12T14:00:00 for T1
  2. 2024-12-12T15:40:00 for T2
  3. 2024-12-18T12:20:00 for T3
Assignment
See the homework 4 assignment.

Homework feedback

You will receive feedback on your homework from me via e-mail. The feedback may be one of the following kinds:

Everything is OK and you get a ✅ in SIS.
Minor issues
You get a ✅ in SIS. You need to fix those along with the next HW.
Regular issues
You do not get ✅ in SIS until you fix them. You need to fix them along with the next HW to be able to continue. If you do not fix those with the next HW, you fail the course.
Major issues
You need to fix those ASAP and let me know when you do. These issues will prevent you from doing the next assignment correctly. If you do not fix those with the next HW at the latest, you fail the course.
Fatal issues
Typically resulting from not following instructions in the HW assignments, or completely missing parts. You need to fix those ASAP and let me know when you do. If this kind of issue appears for the second time, you fail the course.
Missed deadline
In case the deadline passes and there is no solution turned in by your group, you fail the course, unless the reason is serious, e.g. medical.

Homework groups

Be ready to work on the homework with your group during the semester, communicate. In case of problems with your team, such as member or leader not communicating, let me know as soon as possible to avoid problems with deadlines. Final deadline for fixing all HW feedback is 2025-01-10T20:00:00. There must be no errors in the HWs by then.

Avoid splitting homework topics among members in a way that some members do not participate in a certain topic at all. This means they do not practice it enough and it is also unfair as the individual HW parts are not the same in terms of difficulty.

I suggest splitting the team members for each topic as creators and verifiers, rotating throughout the semester. In addition, I suggest establishing communication channels, regular team meetings and internal deadlines for creation and verification at least a few days before the submission deadline.

Common troubles with group homework

Group member or leader not communicating or not doing their part
  1. Contact me, do not hesitate. I will contact the not communicating member demanding explanation.
    • This may be due to illness, which can happen
    • If necessary, I will remove the member from the group
    • If necessary, I will appoint a new group leader
  2. Group size reduction is not a reason for reduction of the homework scope
    • Assignments are doable even single-handedly, but teamwork is part of the experience
  3. Not communicating group member is not a reason for deadline extension
    • Do your homework early, not a day before deadline
    • Set internal team deadlines, check your groupmates’ solution
    • It is unacceptable to say you missed a deadline because one teammate was responsible for a certain task and did not deliver.
  4. Communicate!
    • If you are ill or otherwise unable to work, let your group know ASAP
    • If you are removed from a team, you will fail this course

Exams

See a sample test.