Glossary

View on GitHub

Page last modified on: 2025, February 19

On this page

A

Archive: an organisation, place or collection that stores information for long-term preservation so that it can be accessed and reused by a designated community.
Audit trail: all previous versions of a note are saved and changes are logged.

B

Backup: practice of keeping extra copies of your research data in a separate physical or cloud location from your storage files.

C

Controlled vocabulary: a standardised and organised arrangement of words and phrases used to index and/or retrieve content through browsing and searching. A controlled vocabulary contains predefined, preferred and variant terms and is therefore used to describe a particular domain of knowledge (i.e. it has a defined scope). The use of a controlled vocabulary during the creation of data or metadata supports consistency and accuracy. The terms are usually presented in an alphabetical list of terms and include authority files, taxonomies, term lists and thesauri.
Custom code & scripts: text-based commands used via a command-line interface.

D

Data anonymization: processing of personal data in which direct and indirect personal identifiers are completely and irreversibly removed.
Data interoperability: ability of a dataset to work with other datasets or systems without special effort on the part of the user.
Data Management Plan (DMP): a formal and living document that defines responsibilities and provides guidance. It describes data and data management during the project and measures for archiving and making data and research results available, usable and understandable after the project has ended.
Data Protection Officer: person who oversees the application of and compliance with regulations designed to protect important information from corruption, compromise or loss within an organisation
Data pseudonymization: processing of personal data where the majority of identifying fields are replaced by pseudonyms (i.e. artificial identifiers).
Data repository: a place where digital objects are stored and documented, and where these objects can be separately published and archived. Access to the data can be either open or restricted to a group of users.
Data Steward: an expert in the preparation and management of data, including data selection, storage, preservation, annotation, provenance and other metadata maintenance, and dissemination.
Digital preservation: the act of ensuring the continued findability and accessibility of digital material and keeping it independently understandable and reusable by a designated community, with evidence of its authenticity, for as long as necessary.
Documentation: one or more documents that accompany your data/code and describe the details of the data/code and how/why it was generated.
Dublin Core: domain-agnostic, basic, widely used metadata standard.

E

Electronic Lab Notebook (ELN): a software designed to document experiments, resulting research data and processes. In its most basic form, an ELN replicates an interface similar to a page in a physical lab notebook, allowing input from a computer or mobile device. More advanced forms often offer features such as protocol templates, collaboration tools, support for electronic signatures, and the ability to manage lab inventory. Ultimately, ELNs will replace physical lab notebooks as part of the digital transformation, as it makes sense to document and manage digital data with a digital tool.

F

FAIR data principles: FAIR (Findable, Accessible, Interoperable, Reusable) is a term coined by the FORCE11 community in 2016 for sustainable research data management (RDM). The FAIR Data Principles are a concise and measurable set of principles that can be used as a guide for those wishing to improve the reusability of their data assets. The FAIR data principles promote professional management of research data and have been adopted by the European Commission and integrated into the Horizon Europe funding guidelines.

G

General Data Protection Regulation (GDPR): regulation “on the protection of natural persons with regard to the processing of personal data and on the free movement of such data”.
Git: software tool that helps teams manage changes to files over time and provides a complete mirror of a repository, including its full history.
git-annex: distributed file synchronisation system for managing large files with git, without storing the file contents in git.

I

Identifiable natural person: person who can be directly or indirectly identified using an (online) identifier.
Informed consent: process by which a subject voluntarily confirms their willingness to participate in a particular trial after being informed of all aspects of the trial that are relevant to the subject’s decision to participate. Informed consent is documented by a written, signed and dated informed consent form.

L

Licence: official permission to use something, i.e. “promise not to sue” based on existing rights.
Literate programming: code intermingled within a narrative of the scientific analysis.

M

Metadata: data about data. It is a standardised description of the data in a formal, human- and machine-readable structure. Metadata is considered a subset of documentation as it describes, explains, locates, or otherwise facilitates the retrieval, use, or management of a resource such as a dataset.

N

Narrative description: detailed, written description of computational analyses.

O

Ontology: a list of terms with curated textual definitions with persistent identifiers. The terms are arranged hierarchically from general to more specific and have defined relationships to other terms within the ontology and to external resources (e.g., synonyms, cross-references).
Open Science: make the results of publicly funded research (e.g. publications, research data, processes) publicly and freely available in digital formats under conditions that allow reuse, redistribution and reproduction of the research and its underlying data and methods.
Open source: software source code that is made publicly and freely available and can be redistributed and modified.

P

Persistent Identifier (PID): globally unique, actionable and machine-resolvable strings of (alpha)numeric characters that (1) act as a long-lasting reference to a digital object (e.g. a dataset) itself, (2) resolve to a central landing page, and (3) are maintained by trusted organisations.
Personal data: any information relating to an identified or identifiable natural person (i.e. ‘data subject’).
Preregistration: the practice of documenting your research plan (i.e. research question and study design) before conducting a scientific investigation, and depositing that plan in a read-only public repository.

Q

Quality Control (QC): the process of controlling the use of data for an application or a process.

R

Research data: the collection of digital and non-digital objects (excluding scientific publications) that are generated (e.g. through measurements, surveys, source work), studied and stored during or as a result of scientific research activities. These objects are commonly accepted in the scientific community as necessary for the production, validation and documentation of original research results. In the context of Research Data Management, research data also includes non-data objects such as software and simulations.
Research data life cycle: a model that illustrates the steps in research data management and describes how data should ideally flow through a research project to ensure successful data curation and preservation.
Research Data Management (RDM): the care and maintenance required to (1) obtain high-quality data (whether produced or reused), (2) make the data available and usable in the long term, and (3) make research results reproducible beyond the research project. It complements the research from planning to data reuse and deletion.
Research data workflow: detailed description of the individual processing steps of research data depending on the selected software and the required infrastructures and services.

S

Storage: the act of keeping your research data in a secure location that you can easily access.

V

Version Control Systems (VCSs): software tools that help teams manage changes to files over time.

W

Workflow management system (WMS): a software tool designed to help streamline routine processes for optimum efficiency. The aim of a WMS is to document processing steps.

Further resources

BIO514 - Systems medicine - Microbiome - Glossary
Bioinformatics Glossary
Chiu, C.Y., Miller, S.A. Clinical metagenomics. Nat Rev Genet 20, 341–355 (2019). https://doi.org/10.1038/s41576-019-0113-7
EDAM - Ontology of bioscientific data analysis and data management
Glossary - Data processing and visualization for metagenomics
Research Data Management Terminology
GoFAIR - FAIR principles for metadata vocabulary usage