Assessing Fitness for Use
Before using a published data package, it is important to consider whether it meets the requirements of the research question under investigation. The scope of this assessment may be wide; fitness encompasses not only how the data may serve the needs of the data user, but also how the data have been previously used, the level of processing or manipulation the data have already received, and the use license the data are released under. When planning to use published data, it is often a good idea to initiate a conversation with the data authors.
The license
What license are the data released under? This is one of the first things to consider since it may preclude reuse. See licensing data for more information for more on how licenses affect the use of published data .
Metadata completeness
Metadata should be complete enough to answer questions and leave no uncertainty. Special attention should be paid to sampling methods and units. Understanding the original context and purpose of the data can indicate the data may be reused in similar or related contexts. When metadata are incomplete, consider reaching out directly to the contact listed under the People and Organizations section of the data package full metadata page.
Prior research using the data
Have these data been used in other research articles? Understanding how the data have been used previously can avoid substantial and unexpected overlap. Known usage of a data package is listed in the Journal Citations section of a data package landing page.
Level of processing
How have these data been processed? Data packages often contain data that have been tailored to answer a specific research question. Previously aggregated data may present challenges when used for novel analyses. If a version of the raw data is not included with the data package, it may be important to consider if the data have been manipulated in a way that makes them unsuitable for a specific research question.
Explore the data
Data exploration through visualization and statistical summarization provide access to information that is not conveyed in the metadata but wrapped up in the data. See data exploration for more on tools and techniques.
Engage with the author
When using data collected by someone else, it is both courteous and wise to make direct contact before incorporating the data into a new analysis. Contact information is available under the People and Organizations section of the data package full metadata page.