Dataset for Learning Analytics

The AFEL Dataset for Learning Analytics is in itself a collection of datasets that are useful for performing analytics in online/social learning contexts. It is distilled from the content of the AFEL Data Catalogue, excluding datasets containing user-centered data and others that are not freely redistributable.

The datasets in this collection can be downloaded individually as dumps in RDF format. This page provides the links to each of the corresponding snapshots as of March 2017. The collection aggregates over 434m distinct RDF triples, obtained both by refactoring existing linked datasets made available by members of the AFEL project, and by reengineering third-party datasets that were originally not on RDF (e.g. Coursera, OU Analyse, Outline Maps).

All dumps are provided as BZipped N-Triples or N-Quads (serialisation format for RDF with or without named graph indications, respectively), except where otherwise noted.

Coursera MOOC Discussion Thread

Anonymized versions of the discussion threads from the forums of 60 Coursera Massive Open Online Courses (MOOCs), for a total of about 100,000 threads.

When citing the dataset please use the following reference:

Rossi, L.A. and Gnawali, O. Language independent analysis and classification of discussion threads in Coursera MOOC forums. IEEE International Conference on Information Reuse and Integration (IRI), August 2014.
Source: Data repository on GitHub 4,927,697 triples dump (38m) | alignments license

DBLP – Computer Science Bibliography

Linked Data export of open bibliographic information on major journals and proceedings in computer science.

Source: L3S, DBLP 164,973,975 triples dump (636m) license

LAK Dataset

The LAK Dataset makes publicly available machine-readable versions of research sources from the Learning Analytics and Educational Data Mining communities.

Source: Linked Data for Learning Analytics community 90,968 triples dump (6m) | alignments license: other (open)

LRMI Resource metadata

A collection of online learning resources annotated in accordance with the Learning Resource Metadata Initiative (LRMI) and collected between 2013 and 2015.
Datasets are provided as one set of N-Quads per year.

Source: ITD-CNR 115,113,763 triples dump (1.2g)

Open University courses

Online courses, material and learning opportunities provided by The Open University.
When using or redistributing the dataset, please cite the attribution to The Open University

Source: The Open University 1,110,249 triples dump (4.5m) license: CC BY 3.0

OU Analyse

Anonymised Open University Learning Analytics Dataset (OULAD). It contains data about courses, students and their interactions with Virtual Learning Environment (VLE) for seven selected courses held at The Open University.

When citing the dataset please use the following reference:

Kuzilek, J., Hlosta, M., Herrmannova, D., Zdrahal, Z. and Wolff, A. OU Analyse: Analysing At-Risk Students at The Open University. Learning Analytics Review, no. LAK15-1, March 2015, ISSN: 2057-7494.
Source: The Open University 54,584,125 triples dump (302m) | alignments license: CC BY 4.0

Outline Maps (Slepé mapy)

Data used for modelling quizzes for adaptive learning of geography as published by Slepé mapy (Outline maps in English). Data are taken from a snapshot as of May 2015.

Source: Adaptive Learning, University of Masaryk 70,711,263 triples dump (385m) | alignments license: ODBL

Web of Know How

A Linked Data framework for human tasks and procedures – re-engineered data from WikiHow and SnapGuide.
When using or redistributing the dataset, please cite the attribution to WikiHow, SnapGuide and the Web of KnowHow project.

Source: Web of Know-How 23,073,020 triples dump (641m) license: CC BY-NC 4.0