University of Southampton OCS (beta), CAA 2012

Font Size: 
Imperfect temporal information in data sets
Koen Van Daele

Last modified: 2011-12-16

Abstract


Cultural heritage in general and archaeology in specific often deal with imperfect information. Yet databases and information systems are generally only suited for handling well-defined data and information. This contribution focuses on the problems of dealing with imperfect temporal information and presents a possible solution.

Historical data are seldom complete and accurate. Trying to store this information in a typical RDBMS system therefore presents a lot of difficulties, e.g. how to store the birthdate of a person if we only have a general idea when he was born? One could of course store temporal information as simple text (e.g. "circa 700 AD"), but this leaves us without the ability to actually use the information in queries and filters.

This paper presents a way of dealing with imperfect temporal data, based on research in computer science, and more specifically the concept of fuzzy sets.

For querying and analyzing temporal data, it's important to look at the possible relations between two dates or periods, or – in other words -  time-intervals. Allen in 1983 defined 13 possible relations between two sharp time intervals. The specific needs of heritage data led us to define 5 additional, composite, relations.
Off course, time intervals in historical contexts are seldom sharp, but rather fuzzy. The research of Nagypál & Motik and that of Schockaert offer different algorithms for dealing with fuzzy time intervals. Each algorithm is bound by certain strengths and weaknesses.

Combining above mentioned theoretical models, a successful implementation of fuzzy time intervals and the different algorithms for determining the relations between them was created. This was accomplished using a PostgreSQL RDBMS with a PostGIS extension. The algorithm of Nagypál & Motik and both Schockaert algorithms were tested on a dataset, consisting of the birth- and death days of historic persons related to historic buildings in Flanders (mainly architects). For quite a lot of these persons either the birthday or the death day is completely or partially unknown. The tests show that storing this information as fuzzy time intervals is feasible.

Querying this datasets using the different algorithms produces different results and different degrees of performance. Our tests show that one of the two algorithms designed by Schockaert offers a nice balance of features and performance. Therefore we find this algorithm the most promising for everyday querying of large datasets.


Keywords


temporal; fuzzy sets; fuzzy time intervals; imperfect; vague