ELIXIR-EXCELERATE Interoperability Components from UK
ELIXIR-UK are co-leading the ELIXIR EXCELERATE platform for integration and interoperability of data and services. Currently the ELIXIR-UK Node contributes a number of components to ELIXIR's interoperability platform, described below, along with their role in ELIXIR and workplan.
What it is
BioSharing Information Resources are curated web-based searchable registries of linked information on content standards, databases, and (progressively) data policies in the life sciences, broadly covering the biological, natural and biomedical sciences. BioSharing ensures resources are registered, informative and discoverable, maximizing their adoption and use to assist the virtuous data cycle, from generation to standardization, through publication to subsequent sharing and reuse. BioSharing progressively monitors their maturity, collecting metrics of usage, and levels of endorsement and adoption.
BioSharing Information Resources offers: (i) extensive content, (ii) a network of linked collaborators, and (iii) growing visibility and recognition from funders and journals.
The focus is to map the landscape of community-developed content standards, for both data and experimental metadata, monitoring their:
- development, evolution and integration;
- implementation and use in databases; and
- adoption in data policies by funders and journals
Our goal is to assist researchers, developers, curators, funders, journals, and librarians who lack the support and guidance on how to best navigate and select the various content standards and understand their maturity, or to find the databases that implement them; or simply do not have enough information to make an informed decision on which content standards or database should be recommended in a policy, funded or implemented.
BioSharing has a growing userbase and an Advisory Board including major publishers, librarians, service providers, and it is run and supported by the same core operational team (based at the University of Oxford) who built the MIBBI portal.
Role in ELIXIR
Work has started to review and enhance BioSharing as the ELIXIR Standards Catalogue, a task part of the EXCELERATE WP5 Interoperability Backbone Platform. The Y1 workplan - led by the ELIXIR-UK and ELIXIR-Swiss Nodes - is outlined here. This initial phase will be driven by use cases and evaluation criteria established by WP5; the result will inform a roadmap to guide the next phases, including provision service from UK Node.
BioSharing will complement content for the Tools and Service Registry, and the TeSS system, progressively linking training material on standards (how to create, maintain and evolve them) and their use and implementation in databases and tools.
2. ISA framework
What it is
ISA is a community-driven metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of datasets in an increasingly diverse set of life science domains.
At the heart of this open source framework is the general-purpose ISA format (ISA-Tab, a tabular version, and LinkedISA, the RDF linked data versions: http://isa-tools.github.io/linkedISA) built on the ‘Investigation’ (the project context), ‘Study’ (a unit of research) and ‘Assay’ (analytical measurement) metadata categories. The extensible, hierarchical structure of this research object enables the representation of studies employing one or a combination of technologies, focusing on the description of its experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships). The ISA software suite (http://www.isa-tools.org) - the second element of this framework – acts to create and edit ISA-formatted files, store, serve and convert them to a growing number of other related formats; other tools, however, also exists to manipulate ISA formatted experiments.
The open ISA framework belongs to its community of users and contributors, who are assisted by a dedicated team based at the University of Oxford, who have supported its open developments since 2007.
Role in ELIXIR
ELIXIR’s strategy is to support wide-scale interoperability of public datasets and the ISA metadata tracking framework it is designed and used to facilitate integration of datasets at the metadata level. Furthermore, ISA fulfills the FAIR concept, underpinning the EXCELERATE proposal, and works as an integrated element of the linked data ecosystem, complementing other research objects. To initiate this a first “ISA as a FAIR research object” Hack-the-Spec event was hold on 20-22 July, 2015, also including ELIXIR-NL representative (here is there blog report of the event).
3. Open PHACTS IMS
What it is
Open PHACTS is an Public Private Partnership between the EFPIA partners of the pharmaceutical industry and research institutes in Europe funded by the EU. It is the first of the IMI projects to deliver its promised infrastructure and to be granted an extension. To reduce the barriers to drug discovery in industry, academia and for small businesses, the Open PHACTS consortium has built the Open PHACTS Discovery Platform - a freely available, integrating pharmacological data from a variety of public information resources, provides tools and services to question this integrated data to support pharmacological research, and supports an API for an ecosystem of applications.
The Open PHACTS Discovery Platform is Linked Data based, i.e it draws in dataset descriptors and dataset content in Linked Data (RDF) format. Datasets are described using standards for provenance, using properties from the PAV ontology. Data integration in Linked Data relies on equality links between resources across different datasets. Thus public datasets are accompanied by VoiD linksets, that is the mappings between entities within the same dataset or across datasets. Linksets are the glue between datasets. A VoID linkset contains a collection of link triples that relate the entries in a pair of datasets through a single mapping relationship. The linked datasets are themselves described using VoID using a checklist of properties e.g. the license and version number, the location a query access endpoint containing the data. In Open PHACTS the current 49 contributing datasets, which contribute several billion dataset statements, also contribute nearly 36 million link statements. These links represent (i) two entries that capture different aspects of the same real-world concept (the ChemSpider and ChEMBL entries for imatinib mesylate) (ii) two entries that are highly related, (the ChemSpider and DrugBank records for gleevec), and (iii) an entry that is a relevant reference but not the same real-world concept, (the protein target that gleevec interacts with in the body). Identifiers.org addresses the problem of multiple data mirrors where the same logical resource can be given multiple URIs. The Entity Name System addresses that some entities may be unambiguously identified by providing a URI for the concept that can be used unambiguously.
Linksets are generated using identity authorities and mappings to counterparts in partnership with the data source providers. The IMS is managed by the University of Manchester as a member of the Open PHACTS Platform Core Team.
Role in ELIXIR
The Identity Mapping Service brings the service for handling the links between the data in datasets. We anticipate the use of machine-readable metadata for datasets and the content of datasets as a platform for interoperability. Moreover the knowledge and skills associated with linkset making and link management is also a service.
What it is
The BioCatalogue is a Catalogue of Web Services relevant for the life sciences. Its prime audience is the developers of bioinformatics infrastructures, tools and applications, and technically proficient bioinformaticians.
The BioCatalogue was established in 2008 as a joint enterprise of the EMBL-EBI and The University of Manchester. The BioCatalogue software platform is maintained by Manchester; the resource is hosted by the EBI. Manchester takes chief responsibility for the curation sprints. The BioCatalogue supports REST and SOAP Web Services, and will shortly support WMS/WPS services. It has a sister catalogue, the BioDiversityCatalogue, which uses the same software platform. The BioDiversityCatalogue will be hosted by Naturalis and will be the focus of a Special Interest Group of the TDWG.
The BioCatalogue has a REST API and supports the registration, monitoring, discovery and annotation of Web Services.
Role in ELIXIR
ELIXIR’s strategy is to support wide-scale interoperability of public datasets, using explicitly published APIs with rich metadata. The BioCatalogue is a registry of Web Service APIs. By adopting the EDAM ontology for classifications it is part of the BioRegistry ecosystem of EXCELERATE, supporting the niche of richly described and maintained APIs for ELIXIR Web Services.
ELIXIR is an infrastructure, and EXCELERATE an infrastructure project. The BioCatalogue is squarely aimed at infrastructure stakeholders.