In which we get deeply nerdy about museum dates…

At The IC we spend a lot of time thinking about museum data, because when we build a museum website it has to work seamlessly with the museum's collections and events data. As we've built GLAMKit, we've realised that some widespread issues need general solutions.

One of the particular challenges faced by almost every museum is how to deal consistently with dates. Most of the time, computers are used to dealing with complete, precise, unambiguous dates, such as "August 28, 1979". If everything is unambiguous, it's easy to say whether one date is the same as another, or comes before, or after.

As with many things, life isn't so simple when it comes to museum data…

img_3144.jpg

If your collection is geological, a date like "August 28, 3 million BC" isn't quite right.

If your collection is at all historical, then you're very quickly going to run into uncertainties and approximations.

For example, you know a vase was from the Edo Period, then it was probably made between 1603 and 1808, give or take. You may even be able to narrow it down to a decade, say the 1770s. But getting a complete and unambiguous date is unrealistic at best, misleading at worst.

For database designers, it makes more sense to treat museum dates less like a single point in time, and more like a region of time.

label-nandi.jpg

In the collections databases we've worked with, like Axiell Emu and Vernon CMS, it's most common to store a 'display date' string, which can be any text, plus two precise dates that indicate the start and finish of a region of time.

Occasionally, there's no computer-readable date information, which leaves a museum with no way to query its collection by date!

But even when there is, this approach still presents several problems –

  1. The 'display date' string doesn't have any consistency - in theory one institution might be consistent with how they talk about dates, but there are few standards from one institution to another. As a result, there's not much computers can do with it.
    • One institution may prefer "c. August 1969",
    • another may prefer "Aug 1969 (approx)";
    • let alone something like "Printed c. 1865; reprinted 1869-70".
  2. Natural language has ambiguity. What does "1969?" with a question mark mean? Does it mean "1969 but we're not certain", or "certainly around 1969".
  3. Sometimes a date region is precise ("June 1965"), and sometimes it's not ("around April 1965"). Are these differences reflected in the date range?
  4. If you squint hard enough, you begin to see even more nuances. For example, the phrase "June 1965" has a different meaning to "a day in June 1965". One's a month, one's a day. "1880s" is a decade, "in the 1880s" is some point within that decade - but we don't know whether a year, month, or day is meant.
  5. Sometimes our way of writing about time doesn't let us be that precise. Does '1800s' mean a year in the range 1800-1810 or a year in the range 1800-1899?
  6. Sometimes you need a way to shrug - "we just don't know", or "it's still happening".

pefcentenary006.jpg

Extended Date/Time Format

This is where EDTF (Extended Date/Time format) comes in. It's a format specified by the Library of Congress that gives us way to distinguish between these nuances, in a way that computers can make sense of.

For example, EDTF lets us specify things like,

  • "approximately August 1984" (In EDTF: "1984-08~"), or
  • "a day in August 1984" ("1984-08-uu").

All normal ISO 8601 dates are valid EDTF dates too (e.g. "1984-08-28").

There are several levels of EDTF. At the deep end, it gets complex. The EDTF text

2004-06-(01)~/2004-06-(20)~

… means "An interval in June 2004 beginning approximately the first and ending approximately the 20th".

[1760-01, 1760-02, 1760-12..]

… means "January or February of 1760 or December 1760, or some later month".

We've implemented the EDTF specification in Python, and have just released a new 2.0 version that for the first time covers the entire spec (you can get it here), and includes a Django model field for storing EDTF values in the database.

The EDTF library means we can take a date like "1969-12" (December 1969) and store it in a database, and derive a range of time from it (e.g. 1st-31st December). That means it becomes easy to sort and filter collections that have imprecise or approximate dates. It gives us an incredibly powerful way to deal with time in the way museums think about it.

Natural Dates

That leaves us with one problem: that no-one knows how to write EDTF, or if they do, they almost certainly aren't using it in their collections data. Instead, they're probably using the display-plus-2-dates approach in their collections management system. We need a way to derive the EDTF text from a plain English description of a date.

So we also made an EDTF natural language parser. It converts real-world display dates that we found in collection data into EDTF form.

Here are some examples – natural language on the left, EDTF on the right after our parser has done its work. First some basic examples, then some more complex cases:

   'January 12, 1940' => '1940-01-12'
  '90' => '1990' #implied century
  'January 2008' => '2008-01'

Uncertain and Approximate Dates

   '1860?' => '1860?'
  '1862 (uncertain)' => '1862?'
  'circa Feb 1812' => '1812-02~'
  'ca.1860' => '1860~'
  'approx 1860' => '1860~'

Decades

   '1860s' => '186x'

Seasons

   'Summer 1872' => '1872-22'

Before/After

   'earlier than 1928' => 'unknown/1928'
  'later than 1928' => '1928/unknown'
  'before January 1928' => 'unknown/1928-01'
  'after about the 1920s' => '192x~/unknown'

Unspecified parts

   'year in the 1860s' => '186u'
'month in 1872' => '1872-uu'
  'day in January 1872' => '1872-01-uu'
  'day in 1872' => '1872-uu-uu'

Centuries

   '1st century' => '00xx'
  '10c' => '09xx'
  '19th century?' => '18xx?'

Just showing off now...

   'a day in about Spring 1849?' => '1849-21-uu?~'

See it in action

The SFMOMA collection uses our EDTF library behind the scenes to query and sort works by date.

You can see EDTF 2.0 in GLAMkit's collections models (install GLAMkit from here). GLAMkit uses the EDTF library to handle its museum collections, but the EDTF library isn't GLAMkit- or even Django-specific. Any Python project can use it.

If you're a Python coder, you can install it with

pip install edtf

or see the source code at GitHub. While you're there, check out the outstanding issues – we'd love ideas and contributions!

If you want to start using EDTF in your collections data, we can work with you to automatically create EDTF dates from your existing date information.

Get in touch to start a conversation.

Request a consultation

Greg has been building websites for 17 years. He is an interaction designer and computer scientist specializing in emerging forms of interaction. A founding member of the Interaction Consortium, he is currently the CTO of the Australian Centre for the Moving Image.

Topics
technology