Dataset Documentation

Welcome to version 2.5 of the Open Syllabus dataset! Open Syllabus collects and analyzes one of the largest collections of college course syllabi in the world. At a glance:

  • 9,210,279 syllabi (22.3 billion words of full-text data).

  • Collected from 8,727 colleges and universities in 153 countries. Coverage is currently deepest in the US, UK, Australia, and Canada, which together account for about 75% of the corpus.

  • Spanning roughly 2010 → present. Our earliest documents date back to the late 1990s, but most of the data is concentrated in the last 10 years.

Syllabi are incredibly rich documents. They often contain: long-format descriptions of the course material; lists of books, articles, and web resources assigned in the class; descriptions of learning objectives, grading criteria, and assignments; and chronologically-ordered sequences of readings and topics.

But, there’s also very little standardization in terms of how these elements are organized and presented, making it difficult to systematically analyze them at scale. To aid with this, Open Syllabus uses a suite of machine learning models to extract structured metadata from the documents. As of version 2.5, we provide:

  • Institution – The college or university where the course was taught, with metadata about institutional characteristics from IPEDS, Grid, Wikidata and Carnegie.

  • Course code + section – The identifier for the course that appears in the institutional catalog.

  • Course title – The name of the course.

  • Year + semester – The calendar year and semester in which the course was taught.

  • Field – The department in which the course was taught.

  • Course description – Narrative description of the course content.

  • Book and article citations – The set of books and articles that appear on the syllabus.

  • Required readings – Citations for books, articles, or other materials that are required for the course. (As opposed to just cited or recommended.)

  • Learning outcomes – Lists of skills or competencies that students are expected to acquire.

  • Topic outline – Lists of topics covered in the course.

  • Assessment strategy + grading rubric – How grades are assigned.

  • Assignment schedule – Week-by-week sequences of readings, assignments, and topics.

If you’re new to the project, check out our web-facing views onto the data:

Open Syllabus Explorer

A comprehensive view of the most frequently-assigned books and articles in the corpus, sliced by author, institution, field, country, and publisher.


Open Syllabus Co-assignment Galaxy

An interactive visualization of the underlying “co-assignment graph” – the network of relationships among books and articles formed by aggregating over all pairs of titles that appear together in the same courses.