Archaeologists all over the world produce survey datasets recording archaeological remains at the current land surface. However, they do not do so according to a single method or database design, which makes it difficult to join or merge these datasets in a meaningful way. In addition to evolving agreement on best recording practices, the digitisation, in recent decades, of many such datasets has removed some practical obstacles to the merging of the datasets. The RHP team has therefore decided to run a pilot project in which three large datasets of archaeological survey finds in the area around Rome are to be merged into a single data structure suitable for subsequent scientific analysis.
The goal of this project is then two-fold:
- to show such an integration is methodologically possible and does not result in too much loss of fine-grained data, and
- to obtain a data set of considerable size, through which questions of a more regional scale may be answered.
Contributing to the database
To successfully integrate new datasets in the RHP database, data need to have integrity and adhere to the format developed by the RHP. To enable comparative and aggregate analyses the consortium has developed general site and material classification schemes, and a shared chronological framework.
Your survey database, like the three original RHP datasets, may not have been designed and populated with database integrity in mind. This means that you may not have ensured referential integrity (e.g. finds may belong to non-existent survey units), you may have used few or no input controls (allowing entry of incorrect values), you may have not described your database fields or your attribute values (leading to uncertainty about their meaning), and you may have redundancies in your data (i.e. have stored duplicate information).
All of these issues have to be found and fixed, which takes time and effort, and requires intensive contact with you as the data owner who holds the implicit knowledge and must decide how to handle irretrievable errors and inconsistencies. At the end of this process, the dataset is in a healthy state, ready for deposition according to your national guidelines.
Once the integrity problems have been solved, the term lists in your dataset (site typology, chronological framework, ceramic classifications) must be ‘mapped’ to those in the RHP database schema. This process involves a number of spreadsheet tables in which you specify how each of your terms for periodisation/chronology, site types, and other variables equates to one of the terms in the RHP classification. These tables provide definitions and criteria for each RHP term to help you, the data owner, decide about the best mapping.
For example, your dataset might contain a site type ‘farm’, which you would map onto the RHP site class ‘single rural estate of small size and low socio-economic status’ after checking the definition and criteria for this class. Depending on how many unique terms you use in your own dataset, this can be a more or less laborious process. When the mapping is complete, your data set will have become queryable together with all other survey datasets in the RHP database.
The core of the project is formed by a PostgreSQL geodatabase that allows us to query the dataset both by attribute and location. It consists of two parts: first, the cleaned and migrated original databases of the various individual survey projects; and second, the integrated dataset, which includes key attributes from these source databases on survey strategies, sites, and artefacts. The latter have been mapped to accompanying overarching site and artefact typologies and chronologies, and thus allow us to query multiple datasets.
The database is split into three main parts:
- everything relating to locations (sites, subsites, and survey units) and their interpretations over time,
- information related to the acquisition activity: the method of survey, ground conditions, type of visit, and general acquisition method of any material finds, and
- descriptions of the material finds themselves.
Sites, subsites, and units
Sites are identified through material finds (in case of a field survey), and/or by some other acquisition method (e.g. geophysics). They may have one or more interpretations to reflect the range of different functions (e.g. habitation and storage facility). Interpretations may vary per period.
Subsites are well-identified, coherent parts of a larger site, such as the individual buildings of a villa complex. In general, all considerations mentioned for sites also apply to subsites.
Units are the administrative shapes overlaid onto the landscape and reflect the areas that have been surveyed. In contrast to sites and subsites, units do not hold an interpretation.
The database structure supports two different types of chronological queries. One uses start and end years assigned to each historical period or year range block, the other queries historical periods by name. Both result in all interpretations assigned to that period or year range.
Each participating survey project has developed a different set of interpretative classes, which are retained within the RHP database but are mapped to an overarching RHP hierarchy (e.g., Habitation à High-status / complex isolated settlement à rich farm) to unify the terminology and to allow for queries. All interpretations have an RHP definition to help guide the mapping process.
All survey data are the result of some research activity – desktop study, fieldwalking, airphoto interpretation, etcetera. It is obviously important to have this recorded in the database and to be able to use it as a querying criterion. Legacy survey data acquired in different ways, especially, must be identifiable to avoid data quality issues. Since modern fieldwalking may be conducted according to widely varying protocols, and the results may be significantly biased by visibility and accessibility factors, the RHP database also records details of the sampling protocols, land use/land cover, and visibility parameters for each surveyed unit.
3. Material finds
The materials recorded and collected during surveys are classified and described in many different, often somewhat overlapping ways. The RHP database provides classifications based on 1) material, 2) form, and 3) type. As the bulk of the materials encountered in field survey is pottery, these classifications specify:
- Wares, including, where relevant, fabrics;
- Vessel forms (shapes), including functions if known;
- Vessel types according to published typologies.