Sign up

What’s on the horizon for scientific data services? The latest from the World Data System

The World Data System promotes long-term stewardship of – and universal and equitable access to – quality-assured scientific data and data services, products, and information across all disciplines.

The last year has been a period of transition for the World Data System (WDS), an ISC Affiliated Body.

The International Programme Office (IPO) has moved to Knoxville, Tennessee, and Meredith Goins has been appointed as its Executive Director. Three other staff members were recruited, and the programme’s Scientific Committee also has several new members.  

We caught up with David Castle, Chair of the WDS Scientific Committee; Karen Payne, Director of the WDS International Technology Office; Suzie Allard, Director of the Center for Information & Communication Studies at the University of Tennessee, where the WDS IPO is now based, and Meredith Goins, to find out more. 

What’s been the impact of the recent changes for WDS activities? 

David: This is a period of consolidation and focus. Four or five years ago, we created the WDS International Technology Office (ITO) at the University of Victoria, at Ocean Networks Canada, which is a major research facility and a member of WDS. We recruited Karen to be the Associate Director of the ITO, and that started us off down a path of being able to provide more volume and diversified services to our membership. In the last year, the IPO has moved from Tokyo to Tennessee with support from the University of Tennessee and Oak Ridge National Lab, as well as from the Department of Energy (DOE).  

Suzie: The two offices are working very well together, and the support we can provide members is even greater because activities are being coordinated so closely. That gives us great potential for the future. 

David: About half of the membership on the Scientific Committee has also changed in the last year. We’ve added some key new individuals to join returning members, all of whom are ensconced in the world of data repositories. Over the past few years we’ve put the WDS on a solid footing from which we will be able to launch programmatic activity, and to bring about an alignment with the ISC Action Plans.

We’re trying to understand where repositories and data are now and where they’re going to be going in the coming period. This includes raising questions about the provenance of data, how it’s stewarded, and how it’s kept secure. We’re working on related technical aspects such as FAIR data objects, in partnership with CODATA, and how to work together to bring about standards and interoperability expectations for those. 

We’re also confronting a challenge that doesn’t always get mentioned: There’s a belief and an expectation that once things are online and made available, they’ll persist for free. This is of course not true. To meet the expectation that data will be open and accessible to the greatest extent possible, we need to have frank conversations about where the resources are going to come from. This is an issue for our members, and a major priority for us is how we define the tremendous value that repositories bring nationally and internationally in a way that will help repositories to engage with funders who can support sustainable plans for making that data available.  

Another major priority is making our membership more globally representative. WDS membership is predominantly from the Global North, and it makes sense for us to collaborate with the ISC and CODATA on taking stock of activities in Africa, Latin and South America and Southeast Asia and potentially identifying new members for WDS. We’re also working with other groups that are providing data services in different modalities than sustaining a repository.  

Meredith: Another way we are making our membership more representative is to identify repositories from a variety of subject areas, in addition to the biological and earth sciences, to increase the diversity of our members. Social sciences and digital humanities repositories are just as valuable as the natural sciences. By increasing our membership diversity, we can increase our support for all types of repositories.  

Karen: We are having a large push on some federated services. For example, for polar research we have an opportunity to make data from both poles available to researchers in a way that’s completely aligned, which is tremendously exciting: it’s something the community has been working towards for a long time and we’re glad to be a part of that. 

Federated services for polar data come in two parts: the federated search, which has been going on for a long time with traditional metadata harvesting, and a new set of protocols and processes for harvesting metadata which is more web oriented. It’s less of a traditional catalogue of services, and more along the lines of what you would find for Google Search. The infrastructure we built allows us to send out crawlers to index the landing pages of data repositories that have implemented a particular type of markup on their metadata landing pages. We are providing the ability for researchers to search for data from both the Arctic and Antarctic, and working with the research communities to make sure that the ontologies they implement (the markup) are all aligned as well. 

Securing funding for that kind of work is really tricky. It’s an international project, so there’s lots of conversations about funding in different areas. Here in Canada they’re looking at different funding models, both for national investments and also so that they can be part of a global cooperative set of funders. For example, one of the models they are reviewing is the Global Biodata Coalition that is designed to coordinate global funding for key resources in the life sciences. 

We also have a working group within the Research Data Alliance looking at what we call the Global Open Research Commons. There are different national, pan-national and domain specific organizations that are trying to orchestrate access and interoperability to resources like datasets, software, and computational resources. At the national level it makes sense to have a good governance structure and roadmap for all of their research investments, so you see organizations like the Australian Research Data Commons or the Japanese infrastructure coordinated at the National Institute of Informatics. Pan-nationally you see ambitious projects like the European Open Science Cloud and the African Open Science Platform. And domains like the International Virtual Observatory Alliance that serves astronomers globally are all very important for supporting their respective research communities. The goal of the RDA group is to create a roadmap for how these commons can share resources seamlessly so that it is easier for scientists to work together globally for the greater good. We’re building on work that has been going on for a long time, but it really feels as though there’s a lot of motivation to bring these pieces together now. 

Can you explain what federated search will mean for researchers who are trying to access the data in question, for example for polar research? What will change? 

Karen: Right now researchers have to go to different locations to find data. And then once you find that data, you spend time harmonizing its structure, and then double-checking the content to make sure that you understand what the semantic meaning is of the measured variables in the data. This is a first attempt to make that process more cohesive and machine actionable. To my knowledge this is the only portal that allows users to search for data from both poles simultaneously. Right now we are focused on search and discovery of datasets and bringing more repositories into the index. We anticipate that the infrastructure will evolve to support or feed into other initiatives, like the Canadian Consortium for Arctic Data Interoperability (CCADI) that is building enhanced visualization and analytics tools. We want to support our partners, not reinvent the wheel. 

Suzie: The IPO is committed to getting the word out about all the kinds of work that Karen is doing and making sure that it’s well disseminated. We’re also working to bring everybody up to speed by hosting workshops or trainings and creating opportunities for people to take part. The ITO is doing cutting edge work together with all these different groups. And the IPO is helping to make sure that everybody learns what’s going on as we continue to build these great repositories.

Where do you see work on repositories and data today? And where is it going? What are the new challenges or things that people will need to be thinking about in the next five to ten years? 

David: There are concrete things that need to be done. One of them is ensuring that our member repositories are secure. That’s a critical factor in being able to ensure the integrity of data, which underpins all science. Another is that volumes of data have grown so significantly that old models of moving data to where you would actually work with it in a high performance computing environment, are now getting flipped. It’s now the case that we need to find ways to be able to analyse data in situ, bringing the computer to the data. A challenge is to help WDS repositories become cloud-enabled.  

The other part of this is about workforce capacity and competencies, such as mobilizing data scientists, technical research scientists and data stewards. These are evolving roles within the scientific enterprise that need to be monitored carefully so as to ensure that the right competencies are in place, and that we have the education and training to provide to interested people. 

Karen: A lot of people are working on components that would allow researchers to move away from publishing static papers in journals and instead create a reproducible paper that is available online. Somebody could publish a piece of data or do a piece of analysis, then write it up and publish it as a type of easily reusable package that can be taken up by someone else to either reproduce the same results, which is important to make the assertions of science verifiable, or to reuse it in a new way. Someone could take the package, plug in a different piece of data, or change a parameter on a piece of analysis software and create a new result that they publish. So it becomes about an atomization of the data and the software components, so that you can take bits of things and publish them easily. The reproducible paper helps solve issues with reproducibility of results, re-use of data and potentially redundancy of research. 

You see that trend in software development, where there’s a disaggregation of the APIs (Application Programming Interfaces) on the back-end, so that you can use portions of them. Within the data management community there’s a similar idea around FAIR digital objects – you don’t want to publish this whole downloadable dataset anymore, you want to provide a data service to every observation or measurement and you want to make those measurements machine actionable, so that you can pick and choose which observations you want to use without a lot of processing on your end – the data should be presented in its most accessible form. 

The components, like the data, need to be disaggregated and atomized and accessible by both humans and machines wherever they are distributed across the globe. From a researcher’s and technologist’s point of view it’s all happening from the bottom up. There’s almost too much to get your mind around, so it becomes about how you make little inroads to make it meaningful. The American Geophysical Union (AGU), in particular, has done a really good job of focusing on computational notebooks as a first step to see how a reproducible paper could happen. That’s a really great use case for what will become much more complex infrastructures. 

It’s a lot to take on, and sometimes it’s hard to know exactly where to put your focus. But that’s hopefully one of the value propositions that the WDS IPO and ITO can help our membership with. 

How can readers find out more about WDS and how they can get involved in your activities or become members? 

David: Meredith has been thinking this through. We’ve stepped up our periodic communications with our members and are improving our website with more regular updates, which will continue. There will also be a whole host of other activities as the IPO gets fully staffed, and once our two-year action plan is published. 

Meredith: In addition to relaunching our social media, we are currently finalizing and testing a redesigned website. Future initiatives include outreach and educational webinars for our WDS member repositories, partners, and associated organizations. Additionally, we have a biweekly newsletter for members, time-sensitive communications about opportunities sent via email to members, and we look forward to creating an annual report for the organization, something that hasn’t occurred since 2015-2016. We will also co-launch the WDS Data Stewardship prize and the ITO Data Prize at the same time this year to give early career engineers and scientists two opportunities to show their excellence with data.

Image by NASA via Flickr.

Skip to content