Handling a tsunami of data

Nieuws | de redactie
7 februari 2014 | The era of eScience makes data as valuable as scientific research in itself. Good data management practices are therefore indispensable. “Someone needs to take responsibility for these data.”

In Leiden, a unique group of data-experts, domain leaders and key opinion leaders from across the Life Sciences came together to take steps towards ensuring Data Stewardship for the future. Their plans include key aspects of a global infrastructure for effective data publishing, discovery, sharing and re-use for eScience experimentation.

Data not interoperable

The group included some of the most influential names in the field of biological data management and were drawn from Europe, the US and Latin America, representing leading research infrastructures and policy institutes, publishers, semantic web specialists, innovators, computer scientists and experimental (e)Scientists. The meeting, taking place at the Lorentz center, was part of an ongoing collaboration with the Netherlands eScience Center (NLeSC).

“In the Life Sciences, the data is very complex, because it comes from many different levels, like populations, organisms and genomics. Therefore data is by nature not interoperable.” Barend Mons,  biosemantics professor at Leiden University Medical Center, former Director of the National Bioinformatics Center and eScience Integrator, explains why his research field is in dire need of a proper framework on the use of data.

It is “through professional data-finding that we can speed up discovery”. With that goal in mind, the group gathered to discuss the establishment of a ‘Virtual Research Environment’. An environment like that should overcome the problem of generated data being saved in an ad-hoc manner with little or no metadata to fully explain the provenance of the data.

No reward for entering your data

“This is an effort to take it all one step further.” Niklas Blomberg is the founding director of ELIXIR, the European organization that aims to unite the leading life science organizations in managing and safeguarding the massive amounts of generated data. “There is no reward for entering your data and adding metadata to it,” Blomberg says. Instead it’s about scientific papers, while the data in itself is equally important. “Researchers are looking for support and guidance.”

With the current ‘tsunami of data’ combined with the technological possibilities to assess these data, it becomes more and more important to store and reproduce all the data in a way that everyone gets to use it. “We keep spitting out without knowing how to use it, Mons says. “We need to integrate all these datasets.”

“This is no longer a technological issue, like the framework. We are now talking about social change.” Social change is needed in the way researchers make their data available for future research. At this moment there is not much incentive to do so. Researchers should be rewarded for sharing their data properly in that respect.

Bioscience could set an example

At ELIXIR, of which The Netherlands recently became its eight formal member, Blomberg is trying to help researchers achieve these goals. Blomberg: “We have big training programs to help them deal with their data-management issues. We have data-experts as well, but this also means that researchers have to trust their core-asset to an outsider.”

“It is about social willingness,” George Strawn adds. He is the director of the Federal Networking and Information Technology Research and Development (NITRD) from the US. “Bioscience is a good place to begin with”, he says, and could make a case which proves to be very useful for other disciplines.

The effort made at the Lorentz Center is therefore an important one, Blomberg admits. “It is about making the data sustainable. Therefore, meeting here to create a common set of rules, is so important. I think this is a good assembly as we needed to find a way to bring these people together.”

Backbone for global interoperability

The principle outcome of the meeting is that a backbone will be designed that will enable global interoperability of data. The key requirement in order to achieve this, is to allow that computers can ‘independently’ browse all available data sets for a specific research requirement.

At the core of this ‘DATA FAIRPORT ecosystem’ will be a protocol that will define the basic semantic interoperability of datasets where possible, using endorsed or emerging community standards and protocols. However, Data Fairport is not about telling researchers how to store their data or manage their resources. The Fairport is working from the principle that “this is not the Metadata you should have, and this is how you can transcribe it into the proper format”.  

At this point the aim to create a Virtual Research Environment or “Data Fairport” is in a startup phase, but Barend Mons is happy with the researchers coming to Leiden to fundamentally discuss this matter. “Rather than money, it is knowledge that will get them out of their misery. Now we are modest in our short-term objectives. We need to develop a real solid plan and raise the funds to carry out those plans.”

Schrijf je in voor onze nieuwsbrief
ScienceGuide is bij wet verplicht je toestemming te vragen voor het gebruik van cookies.
Lees hier over ons cookiebeleid en klik op OK om akkoord te gaan