Heritage data-centric research: are FAIR data fair enough?
In the current trend for e-Science, i.e. collaborative, computationally- or data-intensive research, archaeology is not a laggard. A number of initiatives are addressing how to manage and use data produced by heritage research, most notably the ARIADNE one in the archaeological domain (https://www.ariadne-infrastructure.eu), presently involving the most important research centres from all European countries in creating a comprehensive and integrated archaeological data infrastructure that so far has already registered little less than 2.000.000 archaeological datasets. Such infrastructure, implemented by ARIADNE, is bringing archaeology out of the “long tail of science”, i.e. those disciplines that make little use of data-centric research. It is revolutionising the concept of Big Data: not relatively few datasets, each with terabytes of numbers, as in nuclear physics; but millions of small datasets, all potentially relevant to a specific research question but including a large (and unknown) majority probably irrelevant at all.
E-Science relies on the well-known FAIR principles (https://www.force11.org/fairprinciples), stating that data should be Findable, Accessible, Interoperable and Re-usable. Now, if “F”, “A” and “I” mainly depend on the technical way in which data and metadata are generated, stored, managed and curated, the “R” has less technical (but not less important) implications. It involves theoretical, methodological and epistemological aspects that have not received enough attention in the current debate. It has been argued that e-science discovery could be modelled as a deterministic discovery process; nevertheless, even in this perspective, simply modelling the provenance of data is not sufficient, but the provenance of the hypotheses and results generated from analyzing the data need to be modelled as well.
Thus, to reuse data in cultural heritage it is necessary to expand the “R” facet of the FAIR principles at least into R3: Re-usable, Relevant and Reliable. Judging relevance and reliability may appear obvious to a human eye, but it is not to machine processing. Data reliability depends on a chain of trust that needs to be adequately supported by documentation, and on this regard the CIDOC CRM may play a key role. If in the past reference to previous discoveries published in journals and books was based on the academic practice of peer-review and on the authoritativeness of the author and of the publication, re-using data created by others is still lacking a similar good practice.
The session will discuss such aspects and propose ways to address the issue. Contributions will come from purely cultural heritage practice (“What would you need to rely on somebody else’s data?”) to semantics (“What would you suggest to document, in order to support reliability?”). Both aspects will be analysed in light of the CRM: does it already provide a sufficiently rich toolbox, or additions are required? If so, which ones?