Table of Contents

CrEDIBLE thematic working days, October 2-4, 2013

Due to the increasing on-line availability of various biomedical data sources, the ability to federate heterogeneous and distributed data sources becomes critical to support multi-centric studies and translational research in medicine. The CrEDIBLE project organises 3 thematic working days in October 2-4 in Sophia Antipolis (near to Nice, France) where experts are invited to present their latest work and discuss their approaches. The aim is to gather scientists from all disciplines involved in the set up of distributed and heterogeneous medical data sharing systems (medical data representation, data mediation, data stores federation, data semantics, workflows, … towards biomedical data integration), to provide an overview of this broad and complex area, to assess the state-of-the-art methods and technologies addressing it, and to discuss the open scientific questions it raises.

The methods for biomedical data distribution considered in the context of CrEDIBLE are:

A previous issue of this workshop was organised on October 2012.

Working days organisation

The idea of this workshop it to have groups of ~30 minutes presentations within a given theme/session followed by a time slot for a panel discussion to share the presenters experience on selected scientific questions and challenges. Talks should keep a balance between introducing the field (appropriate for a broad audience of scientists involved in all the areas covered by CrEDIBLE) and technical details (appropriate for expert scientists).

Thematic sessions

Session 1: Data repositories for secondary use of clinical and research data

Session goal: To report on concrete experience in developing systems gathering or indexing data to be shared and reused in research projects. User requirements, current technology limitations and future expectations.

Scientific questions:

  1. Data indexing: how to meet the expectations of researchers in terms of precision of the vocabulary
  2. Data provenance: what do we need actually: detailed models of provenance ? or more “distilled” information ?
  3. Access control: how to accomodate multiple access policies specified by contributing entities ? is it manageable in practice ? should it be applied to datasets only (e.g. images, signals) or to metadata as well ?
  4. Data federation: what level of data federation is required? What are the data sources to federate? What are the data models in use?

Session 2: Biomedical ontologies

Session goal: To discuss ontologies modeling observations and measurements data (designed to facilitate the sharing and reuse of scientific data)

Scientific questions:

  1. modelling related entities (observed entity, measured quality, measurement results, units of measurements) and relationships
  2. relation with foundational ontology (ies) such as DOLCE, DOLCE-CORE and BFO ? compatibility ?
  3. relation to existing ontologies of qualities
  4. how to model complex observation data such as images ? notion of field ?
  5. how to model time varying phenomena ?

Session 3: Data mediation

Context: heterogeneous data stores, within the same or related domains (e.g. biology, medical images, clinical records…), sensitive.

Scientific questions:

  1. Reference model. Taxonomies or ontologies can be used as reference model. Is this the most appropriate reference model? What are the target models of the methods presented.
  2. Query language. SPARQL can be used as query language to access data in heterogeneous databases. Is it the most appropriate query language? What are the query language applicable to the methods presented.
  3. How to mediate various data sources (Pros and cons of each approach. Use cases. Are there hybrid approaches?):
    • statically (ETL): transform data source periodically (e.g. into RDF stores).
    • dynamically: query on-the-fly.
  4. How to ensure access control in an heterogeneous deployment
    • at coarse-grain (each data repository)
    • at fine-grain (each entity within a data repository)

Session 4: Data federation

Context: non-partitioned data stores, no prior knowledge on stores content, common RDF representation, common reference ontology.

Scientific questions:

  1. Query language. Is SPARQL (v1.1) the most appropriate language? What is the trade-off between expressiveness and performance?
  2. Performance. What is the performance impact? Gain of parallel execution of queries vs network overhead?
  3. Especially when deploying over a WAN?
  4. Scalability. How scalable are the different methods proposed? To what scale have they been tested?
  5. Reliability. What is the impact of low reliability? Can queries be partially processed in case of communication failures with some data stores? Can end-users be notified on the kind of potentially missing information?

Session 5: Graphs and reasoning

Scientific questions:

  1. How to process large RDF graphs? Storage in databases, scalability of graph processing algorithms, graphs indexing.
  2. How can semantics described in ontologies be used to interpret RDF data? While the Web of data focusses on large data sets processing, the semantic Web involves costly reasoning processes. There is a trade-off to be found between the amount of data to process and the reasoning capabilities of the system.
  3. Other scalability opportunities when addressing data querying: top-k query answer algorithms, probabilistic algorithms?
  4. How to visualise large data graphs?

Agenda

Wednesday, October the 2nd, 2013

Welcome and introduction

Session 1: Data repositories for secondary use of clinical and research data

from 14:30 to 18:15

19:30, Diner at Hotel Omega

Thursday, October the 3rd, 2013

Session 2: Biomedical ontologies

from 9:00 to 12:45

Session 3: Data mediation

from 14:00 to 16:00

Session 4: Data federation

from 16:15 to 18:45

Friday, October the 4th, 2013

Session 5: Graphs and reasoning

from 9:00 to 12:00

Workshop conclusions

Venue

The workshop will be held in the conference room, located at the ground floor of the I3S laboratory, in Sophia Antipolis, France. The I3S laboratory address is: Building “Algorithms B”, 2000 route des Lucioles, BP 121, 06903 Sophia Antipolis Cedex, FRANCE

Where is it ?

How to go there ?

Also see the interactive map.

Nearby hotels

A list of nearby hotels (all within 10 minutes walking distance) is located on this map:

http://www.hotel-bb.com/hotel_info?hotelId=4532

http://www.ibishotel.com/gb/hotel-0711-ibis-antibes-sophia-antipolis/index.shtml

http://www.accorhotels.com/gb/hotel-1122-mercure-antibes-sophia-antipolis/index.shtml

http://www.hotelomega.com/en/index.php

http://www.accorhotels.com/gb/hotel-1279-grand-hotel-mercure-sophia-country-club/index.shtml

http://www.novotel.com/gb/hotel-0398-novotel-sophia-antipolis/index.shtml