CrEDIBLE thematic working days, October 2-4, 2013
Due to the increasing on-line availability of various biomedical data sources, the ability to federate heterogeneous and distributed data sources becomes critical to support multi-centric studies and translational research in medicine. The CrEDIBLE project organises 3 thematic working days in October 2-4 in Sophia Antipolis (near to Nice, France) where experts are invited to present their latest work and discuss their approaches. The aim is to gather scientists from all disciplines involved in the set up of distributed and heterogeneous medical data sharing systems (medical data representation, data mediation, data stores federation, data semantics, workflows, … towards biomedical data integration), to provide an overview of this broad and complex area, to assess the state-of-the-art methods and technologies addressing it, and to discuss the open scientific questions it raises.
The methods for biomedical data distribution considered in the context of CrEDIBLE are:
Federation: the (virtual) fusion of geographically spread data stores which should appear to end users as a unique and coherent data source.
Mediation: the semantic alignment of heterogeneous data sources, which were often designed independently from each other.
Querying: the description of distributed data sets, defined through data retrieval queries that apply on the whole federated system.
Data flow: the use and the enrichment of the federated data stores through the use of data processing pipelines.
A previous issue of this workshop was organised on October 2012.
Working days organisation
The idea of this workshop it to have groups of ~30 minutes presentations within a given theme/session followed by a time slot for a panel discussion to share the presenters experience on selected scientific questions and challenges. Talks should keep a balance between introducing the field (appropriate for a broad audience of scientists involved in all the areas covered by CrEDIBLE) and technical details (appropriate for expert scientists).
Thematic sessions
Session 1: Data repositories for secondary use of clinical and research data
Session goal: To report on concrete experience in developing systems gathering or indexing data to be shared and reused in research projects. User requirements, current technology limitations and future expectations.
Scientific questions:
Data indexing: how to meet the expectations of researchers in terms of precision of the vocabulary
Data provenance: what do we need actually: detailed models of provenance ? or more “distilled” information ?
Access control: how to accomodate multiple access policies specified by contributing entities ? is it manageable in practice ? should it be applied to datasets only (e.g. images, signals) or to metadata as well ?
Data federation: what level of data federation is required? What are the data sources to federate? What are the data models in use?
Session 2: Biomedical ontologies
Session goal: To discuss ontologies modeling observations and measurements data (designed to facilitate the sharing and reuse of scientific data)
Scientific questions:
modelling related entities (observed entity, measured quality, measurement results, units of measurements) and relationships
relation with foundational ontology (ies) such as DOLCE, DOLCE-CORE and BFO ? compatibility ?
relation to existing ontologies of qualities
how to model complex observation data such as images ? notion of field ?
how to model time varying phenomena ?
Context: heterogeneous data stores, within the same or related domains (e.g. biology, medical images, clinical records…), sensitive.
Scientific questions:
Reference model. Taxonomies or ontologies can be used as reference model. Is this the most appropriate reference model? What are the target models of the methods presented.
Query language. SPARQL can be used as query language to access data in heterogeneous databases. Is it the most appropriate query language? What are the query language applicable to the methods presented.
How to mediate various data sources (Pros and cons of each approach. Use cases. Are there hybrid approaches?):
How to ensure access control in an heterogeneous deployment
Session 4: Data federation
Context: non-partitioned data stores, no prior knowledge on stores content, common RDF representation, common reference ontology.
Scientific questions:
Query language. Is SPARQL (v1.1) the most appropriate language? What is the trade-off between expressiveness and performance?
Performance. What is the performance impact? Gain of parallel execution of queries vs network overhead?
Especially when deploying over a WAN?
Scalability. How scalable are the different methods proposed? To what scale have they been tested?
Reliability. What is the impact of low reliability? Can queries be partially processed in case of communication failures with some data stores? Can end-users be notified on the kind of potentially missing information?
Session 5: Graphs and reasoning
Scientific questions:
How to process large RDF graphs? Storage in databases, scalability of graph processing algorithms, graphs indexing.
How can semantics described in ontologies be used to interpret RDF data? While the Web of data focusses on large data sets processing, the semantic Web involves costly reasoning processes. There is a trade-off to be found between the amount of data to process and the reasoning capabilities of the system.
Other scalability opportunities when addressing data querying: top-k query answer algorithms, probabilistic algorithms?
How to visualise large data graphs?
Agenda
Wednesday, October the 2nd, 2013
Welcome and introduction
Session 1: Data repositories for secondary use of clinical and research data
from 14:30 to 18:15
19:30, Diner at Hotel Omega
Thursday, October the 3rd, 2013
Session 2: Biomedical ontologies
from 9:00 to 12:45
-
-
-
10:30, coffee break
-
(cancelled) Georgio Gkoutos (U. Cambridge): The ontology of Units of measurements and their relations to qualities
11:15, Panel discussion. Moderators: Bernard Gibaud, Gilles Kassel
Session 4: Data federation
Friday, October the 4th, 2013
Session 5: Graphs and reasoning
Workshop conclusions
Venue
The workshop will be held in the conference room, located at the ground floor of the I3S laboratory, in Sophia Antipolis, France. The I3S laboratory address is: Building “Algorithms B”, 2000 route des Lucioles, BP 121, 06903 Sophia Antipolis Cedex, FRANCE
Where is it ?
How to go there ?
By plane:
"Nice Cote d'Azur" airport (NCE). There are two terminals (T1 and T2) with a free and frequent shuttle bus circling between the terminals.
By train: “Antibes” train station. From the train station there are 3 options:
Bus
Envibus line number 11 is direct from Antibes train station (“SNCF” bus stop) to EPU building (“IUT” bus stop).
Bus
Envibus line number 1 runs from the train station bridge (“passerelle SNCF” bus stop) to Sophia Antipolis (“IUT bus stop”). Be aware that there are 2 buses number 1 line ends: “Lycée Léonard de Vinci” (these ones stops before reaching Sophia Antipolis) and “Gare Routière de Valbonne - Sophia Antipolis” (take one of these).
Bus
Envibus line number 9 runs from bus stop “Vautrin bas” (in the vicinity of the Antibes train station bridge, short walk on your left after crossing the bridge) to the “Belugues” bus stop in Sophia Antipolis (see map). Be aware that there are 2 buses number 9 line ends: “Lycée Léonard de Vinci” (these ones stops before reaching Sophia Antipolis) and “Gare Routière de Valbonne - Sophia Antipolis” (take one of these).
-
By car: A8 highway. Exit at “Antibes - Sophia Antipolis” then follow Sophia Antipolis and reach “carrefour des Chappes” round-about on the map below.
Also see the interactive map.
Nearby hotels