Thirty years since the inception of the World Wide Web, the culture of sharing and collaboration that it embodied has progressively expanded from open source software, to open access publishing, to open data and entirely open analysis, and to the growing open science movement. There is now an opportunity, and arguably an obligation, to systematically open science and its outputs to a wider range of societal actors, including citizens, to address shared problems and enable the joint creation of actionable knowledge.
2.1 Data-driven interdisciplinarity
(project in progress)
Many of the major contemporary problems faced by science and society are inherently complex. They concern the operation of systems that exhibit emergent behaviour as a consequence of interactions between their component parts. Some examples include the operation of cities, the human brain, the dynamics of infectious disease, climate change and pathways to sustainability. Researching these challenges almost invariably requires interdisciplinary collaboration. The tools of the digital revolution, now enhanced by the techniques of artificial intelligence, have created unprecedented opportunities to exploit such collaboration by integrating relevant data from disparate disciplinary sources. The prospect is of realizing Stephen Hawking’s prediction that “the next century [the 21st] will be the century of complexity”.
Yet our ability to combine data from heterogeneous sources and across disciplines remains limited in many instances and, at best, is excessively resource intensive. The adoption of new data-intensive techniques across scientific communities and practices is uneven, and the manual effort required to prepare and cleanse data before use is a considerable diversion of scientific resources. Ontologies and vocabularies are often incompatible and sometimes quite inadequate to the task.
Addressing these problems is crucial if we are to use to best effect the increasing quantities of diverse data to understand the complex systems that are at the heart of global challenges. Doing so will require the widespread adoption of replicable, generic approaches to data integration and FAIR (Findable, Accessible, Interoperable and Reusable) data standards in more science disciplines and interdisciplinary research areas. This is a decadal effort and its success will depend on active participation and engagement from all disciplines, including the social and human sciences, and by scientists from all parts of the world, including countries whose data science capacities may be limited.
More effective, evidence-based solutions for complex global challenges based on interdisciplinary collaboration enabled by data integration policies and practices across scientific fields and disciplines.
Working with the support of the ISC, the Council’s Committee on Data for Science and Technology (CODATA) has been developing technology and semantic good practice for data interoperability and integration. Based on this initiative, a threepronged programme is planned, comprising:
- good practice for data integration that are applicable across a wide range of disciplines.
- Interdisciplinary case studies in global challenge areas (infectious disease, resilient cities and disaster risk reduction) designed to contribute value in these areas but also to act as demonstrators of the value and importance of the approach in all areas of complex interdisciplinary science.
- Engagement with scientific unions and associations in programmes of work designed to promote progress across the disciplines of science that will enable
Programme development will be led on behalf of the ISC by CODATA, working in partnership with the Council’s World Data System (WDS) and the Research Data Alliance (RDA). The ISC will work to promote membership engagement in ways that extend this approach to new communities of scientists and stakeholders, including from developing regions, where the open science platforms described under Domain Four could be key agents for these processes.
2.2 Global data resources and governance
(Project for development)
It is 30 years since Tim Berners-Lee’s vision of universal connectivity and openness became the World Wide Web: open and accessible to all. The culture of sharing and collaboration that it embodied has progressively expanded from open source software, to open access publishing, to open data and entirely open analysis, and now to the growing open science movement. This modern offspring has much wider dimensions than open access publication and simply making data available. It extends to providing information on how to repeat or verify an analysis, exposing results that can be reused by others for comparison, confirmation, or deeper understanding and inspiration. But it has two even more ambitious and radical targets:
- Firstly, if we are to understand the complex systems that are at the heart of most global challenges (see project 2.1) by analysing the wide diversity of data that this involves, we need to have access to such data in an interoperable form. To achieve this would require a widespread ethos and practice of data sharing, not just within the publicly funded scientific community, but across the public and private sectors, including government, scientific publishers and international agencies.
- Secondly, there is now an opportunity, and arguably an obligation, to systematically open science and its outputs to a wider range of societal actors and to citizens in addressing shared problems and in the joint creation of actionable knowledge. For both purposes, stimulating the widespread availability and use of data resources, and securing their effective governance, are vital issues for 21st-century science and for addressing today’s global challenges.
This is therefore a timely moment to consider the global data ecology, its governance, ownership, accessibility and usability in the data universe; to identify principles for the emergence of systems, protocols and commons from these typologies; and to envisage how a federated global commons might develop and operate to the benefit of science. It is also vital to understand how global society’s data patrimony may best be conserved when many scientific databases are supported by short-term funding in the absence of a sustainable business model. Maximizing affordable access to well-governed data is in the interest of all scientific fields and communities, as well as funders, publishers and companies handling large amounts of data.
A global, cross-sectoral coalition of support for principles and processes of data access, for the adoption of priorities for its federated governance, and for sustainable business models for key scientific databases in a way that aids the global scientific enterprise.
The ISC will convene a group of technical experts and representatives of the principal data-holding sectors to determine the scope and ambition of the project. An exploratory phase would consider the taxonomy of scientific data-holding and its governance within commercial enterprises and publicly funded research efforts, global and national monitoring entities, and national statistical and standards bodies. It would explore the role of open science platforms or ‘commons’ in supporting open data initiatives, and the potential for a global data commons. Its findings would be analysed by a high-level group of representatives from relevant bodies and by technical experts. They will make recommendations on topics such as the optimal principles of data governance for adoption by the widest number of data holders, and propose a programme for further action.