Definition
The term “data provenance”, sometimes called “data lineage,” refers to a documented trail that accounts for the origin of a piece of data and where it has moved from to where it is presently. The purpose of data provenance is to tell researchers the origin, changes to, and details supporting the confidence or validity of research data. The concept of provenance guarantees that data creators are transparent about their work and where it came from and provides a chain of information where data can be tracked as researchers use other researchers’ data and adapt it for their own purposes.
A molecular biologist uses data that is derived from public databases, some of which are derived from academic papers and from experimental observations. A provenance record will keep this history for each piece of data, including where it came from, who originally collected it, and what modifications or transformations have been done to the data.
Relevant Literature
Werder, K., Ramesh, B., & Zhang, R. (Sophia). (2022). Establishing Data Provenance for Responsible Artificial Intelligence Systems. ACM Transactions on Management Information Systems, 13(2), 22:1-22:23. https://doi.org/10.1145/3503488
Mayernik, M. S., DiLauro, T., Duerr, R., Metsger, E., Thessen, A. E., & Choudhury, G. S. (2013). Data Conservancy Provenance, Context, and Lineage Services: Key Components for Data Preservation and Curation. Data Science Journal, 12, 158–171. DOI: http://doi.org/10.2481/dsj.12-039
Viglas, S. D. (2013). Data Provenance and Trust. Data Science Journal, 12, GRDI58–GRDI64. DOI: http://doi.org/10.2481/dsj.GRDI-010