Sign up

From siloed data to shared knowledge: How WorldFAIR is shaping the future of research

Scientists often face challenges in accessing and using research data due to inconsistent terminology, unstructured formats, and missing details. The WorldFAIR project addresses these issues by promoting the FAIR (Findable, Accessible, Interoperable, Reusable) principles to improve data accessibility and foster collaboration across disciplines. At the core of the project are 11 case studies covering a broad range of scientific fields and global communities.

Digital tools offer ever-growing opportunities for collaborative science taking on global challenges – but too often, valuable data needed to inform that work remains out of reach for researchers.

Data may be buried in an unsearchable collection, encoded with idiosyncratic terminology or in a way that can’t easily be made to work with other data – or not readily usable because scientists can’t verify details about the data itself, like its origins or terms of usage.

“This is a problem which is actually as old as science itself,” explains Simon Hodson, the executive director of the Committee on Data (CODATA) of the International Science Council (ISC), which works to improve the availability and usability of data.

These data problems can limit research opportunities, and waste time and money. According to research published by the European Commission in 2018, cleaning up poor-quality data to make it usable is by far the most time-consuming task for an average data analysis project, and can amount to 80% of the total effort.

The WorldFAIR project, a collaboration between CODATA and the International Science Council (ISC), took on this problem. The project aimed to “make data work” by encouraging the adoption of the FAIR (Findable, Accessible, Interoperable, Reusable) data principles, fostering better data management and research supported by machine-assisted analysis. 

With the project wrapping up, CODATA aims to continue and expand the initiative with WorldFAIR+, which will include new partners and international case studies putting into practice lessons learned over the two-year WorldFAIR project. 

The new phase will be structured as a “federation” of projects, providing a framework for collaboration where scientists can share technical expertise and build on each other’s work. CODATA is inviting potential partners to suggest case studies and get involved.

Data interoperability case studies

The initial work by CODATA which provided the basis for WorldFAIR started in 2017, with support from the ISC and funding from the China Association for Science and Technology. That formative work included workshops which led to the development of three case studies, which each focused on data use in a specific field: infectious diseases, urban planning and disaster risk reduction. In the project’s initial stages, CODATA also developed a key partnership with the Data Documentation Initiative (DDI).

Building on these efforts, CODATA secured funding from the European Commission for WorldFAIR. The project supported 11 case studies examining data use in a wide range of fields – including cultural heritage, nanomaterials and ocean science. The case studies spanned 13 countries, including Brazil, Kenya, New Zealand and the U.S.

Lessons learned from the project formed the basis of 11 policy recommendations to improve the use and availability of data for science, and led to the development of the Cross-Domain Interoperability Framework (CDIF), which aims to make data from different scientific fields more interoperable. 

At the same time, CODATA has published new Research Data Management Terminology, which provides clear definitions of terms used in the field; those terms have now been published as a machine-readable “FAIR vocabulary”, and will soon be available online in a more easily human-readable format. 

Each of the 11 case studies also generated their own reports and guidance for data use, aiming to make recommendations relevant across different domains of science. 

One of the case studies looked at agricultural biodiversity, focusing on pollination – a field where the model for describing and categorizing data is still being defined. Building on data and input from colleagues around the world, researchers from half a dozen countries – Brazil, Kenya, Argentina, the U.S., UK and the Netherlands – developed a comprehensive guide and set of tools for data related to how plants and pollinators interact. 

It’s an extremely specific topic, but one that is relevant nearly everywhere, to scientists in any number of different fields – who can now benefit from a unified, standard way to approach the data, making it easier to build on colleagues’ work and accelerate their own research. 

“Moving from diverse approaches and siloed initiatives to widely available FAIR plant-pollination interactions data for scientists and decision-makers will enable the development of integrative studies that enhance our understanding of species biology, behaviour, ecology, phenology, and evolution”, write researchers who worked on the case study

In another case study, researchers looked at disaster risk reduction. “As climate change and increased populations are likely to increase both the severity and frequency of disaster, the need for reliable data to inform our responses becomes ever more critical,” they write. 

Scientists and national and international agencies working on disaster risk look to the past to estimate the impact of possible future events, and understand how to mitigate and recover afterward. They also draw on data being churned out constantly by sensors on earth and on satellites, run by public and private sources.

But in the case study, researchers found it difficult to get the kind of information needed to make accurate assessments, because much of the relevant data doesn’t fit the FAIR data principles. Vital information is often missing – like the number of people injured in a disaster, or how quickly the event unfolded. In other cases, national authorities use their own methods to calculate key data points without showing their work, making it difficult for others to compare.

Based on their extensive research, the case study team made a series of recommendations for practices that should make it easier to make evidence-based policy decisions in this increasingly urgent field – “a fundamental step toward building safer, more resilient communities and nations”, they write. 

Researchers with the International Union of Pure and Applied Chemistry (IUPAC) took on a case study looking at how data and terminology related to chemicals can be made more easily usable for both humans and machines. 

IUPAC has more than a century of experience convening chemists to define and standardize the way scientists in the field work with and talk about chemicals. But as digital tools – and increasingly AI and related tech – offer new ways to work, the IUPAC case study looked at how those standards could be made more efficient, and make it easier for other scientists to re-use chemistry data. 

One of the case study’s products was a “cookbook”, an open resource of guidelines to help scientists – including students, teachers and working professionals – understand how to work with chemistry data, and how to make their own data more accessible to others. 

The project also described an ambitious new open digital protocol that could plug many different global chemical databases together, allowing scientists to find and access data with a single query – and equally, to check whether their own data is machine-readable. 

Building a shared language for scientific data

Bringing together scientists to talk about the data they produce, and try to understand how others work with their own data, has been eye-opening, Hodson explains. 

And by setting out clear standards and definitions, scientists are not just helping out current research, but also making it easier for successive generations to build on their work – maybe in ways which the original authors may never have considered, he adds. 

“Something that we found in WorldFAIR was just how fascinating and useful it was simply to have these conversations, to get all the case studies in a room and have them talking about their data and what they do and how it works and how they describe it – and in some instances identifying connections that we hadn’t necessarily imagined in advance,” he says. 


You might also be interested in

blog
24 July 20245 min read

WorldFAIR: Continuing to transform data to tackle complex challenges in a follow-up project

Learn more Learn more about WorldFAIR: Continuing to transform data to tackle complex challenges in a follow-up project
Data science blog
07 June 202211 min read

Implementing FAIR data principles – what’s behind the acronym?

Learn more Learn more about Implementing FAIR data principles – what’s behind the acronym?

Image by Taylor Vick on Unsplash.


Disclaimer
The information, opinions and recommendations presented in our guest blogs are those of the individual contributors, and do not necessarily reflect the values and beliefs of the International Science Council


Please enable JavaScript in your browser to complete this form.

Stay up to date with our newsletters

Skip to content