FAIR data and other requirements
FAIR principles
Along the way to anchored data, researchers have the opportunity to decide independently and with the support of various tools whether the data will be openly available or not, whether and at what stage of the project the data will be published and to what extent, or when it will no longer benefit from it as an author.
Not all data can be shared openly. The sharing of certain data may be in conflict with the rules on the protection of personal data or in conflict with copyright law.
The achievable goal for data is "As open as possible as closed as necessary".
A suitable form of publishing research data is described by the so-called FAIR principles. The principles emphasize the availability of data for automated work of computer systems without human intervention. FAIR data respects the reality of practical data sharing and therefore may not be open data in all circumstances.
Findable data is stored in a convenient location that allows anyone to trace it. The data is described by metadata and provided with a unique and persistent identifier.
- The data have a persistent identifier
- The data have a sufficient metadata description
- Metadata is online (catalog, data repository)
- Metadata has a persistent identifier attached
Accessibile data should be available under clearly defined conditions. If no data is available, at least a metadata record should be available.
- Always get data, or at least metadata, using a persistent identifier
- The data acquisition protocol follows recognized standards
- If necessary, authentication and authorization are required for access
Interoperable (Meta) data is linked to other (meta) data through recognized standards and formats so that it can be combined and shared.
- Data are in common and ideally open formats
- Metadata follows relevant standards
- Where possible, controlled dictionaries, keywords, thesauri, etc. are used for the description.
- References and links to other related data are provided
Reusable data for correct interpretation, (meta) data should be described in detail. (Meta) data should meet the standards of the scientific field. The data is shared under the least restrictive and clearly defined license.
- The data are well described
- The data are licensed
- Relevant industry standards are used
More on the GoFair initiative website.
Data and metadata format requirements
For good orientation in the data, describe important data well. And for your orientation and others, if the data will be shared. For example, it is appropriate to answer the question: "What would I need to know to work with this data in 10 years?"
Part of the metadata description of a specific data set is also marked with permanent identifiers, such as the persistent DOI identifier or the author identifier ORCID.
It is also advisable to attach a sample of a suitable citation format of published data.
It is recommended to pay attention to various field practices and use appropriate formats and standardized various fields dictionaries so that the data can be well understood, easily used and as much as possible to prevent possible misinterpretation of data. Suitable standards can be found, for example, on the DCC List of Metadata Standards, or on the open data pages of the Czech public administration - open formal standards. The Open AIRE page deals with the rules and standards for repositories.
Data anonymization
When publishing data, it is necessary to address their anonymization if necessary. It is a process of secrecy of the identity of all examined persons or institutions in all documents that are the output of scientific research. For example, you can use the tool Amnesia on the OpenAIRE website.
If it is not possible to anonymise the data, it is necessary to precisely define the persons who are authorized to work with the data, which were provided by the researched persons or research institutions. Researchers must also secure research data databases against unauthorized access to datasets.
License for research data
Provide ORD license is an important condition for the publication and further free use of data. A suitable public license is, for example, Creative Commons (CC) or Open Data Commons. Repositories usually have a license already set up. If the data is not stored in the repository, the license should be attached to the published data, preferably to metadadata.
The license can be used only in cases where some copyright-protected content is provided, or a database protected by the special rights of the acquirer of the database. In cases where there is no content that could be licensed, the CC clause does not bind anyone and in any way. Simple data is not protected by intellectual property rights.
It is also possible to use Creative Commons licenses only if the data provider is authorized to handle the work in this way (eg it is sufficiently authorized to grant sub-licenses or it is its employee works).
In accordance with the principles of open data, the two licenses mentioned below are recommended.
CC-BY 4.0 license allows you to use the data for anyone and for any purpose, It is necessary to state the name of the author. If the research data is qualified as a work, it is advisable to use this license.
The CC0 (free work) license allows anyone to use the data in any way. Mentioning the author's name is considered standard, but in this case it is not legally enforceable. The author does not provide any guarantees for the work and disclaims liability for all possible uses of the work, to the fullest extent possible. Czech law does not allow you to use this license, but outside the Czech Republic it is possible. More…