Data Federation Review for USIO's (email) (Google Talk)
@fils (twitter)

Press key to advance.

Having issues seeing the presentation? Read the disclaimer


Do no harm!

  • Do no harm! The IO's have their pattern and approach
  • Look to cross IO needs
    • metadata extraction and data archiving
    • data navigation (discovery, navigation and query)
    • Tools integration (Corewall, CoreRef, GeoMapApp, GIS, statistics, etc.
  • Help identify and address future goals across IO's
    • enhance discovery, linking and use of data
    • remote copies for ship use
    • new (other 3rd party) tools and data extraction capacity

Models, Queries, Federation (oh my!)

  • Data model -> service -> data model (we do a lot of this)
    We have DM's on both end and WE don't do a lot of value add in the middle
  • Building queries (and services)
    • limited developer resources
    • unlimited ideas for queries and end points from community
    • need to support exploring, not just query (need both of course)
  • Federation
    Warehousing or query level (GAS, LAS) takes maintenance and governance that is hard to scale out without visible results from the effort
Behavior 1/3


  • Dictating formats unlikely to work (XML,JSON, CSV, etc) (don't confuse a model with serialization)
  • Accommodate other approaches, be able to consume and expose many formats and in several patterns)
  • Facilitate working with the data not so much the services (operate on the model, not an API)
Behavior 2/3


Open == expose the DATA

  • Doesn't mean no query or parameter UI's .... but those are not enough
  • Expose the model in such a way it can be harvested if desired
  • Allow others to directly query and operate (extend locally) the data
Behavior 3/3


  • Expose concepts and resources that are natural links
  • use those concepts and resources to link across data sets
  • facilitate exploring based on these "keys"
  • For use these are like: leg (exp), site, hole, taxon, lith, instrument, procedure, and many others

Linked Data (pragmatic)

  • Agreements between concepts do NOT have to be in the host data model
  • Focus on the pattern not the implementation
  • Tried to this point to not talk about SKOS (OWL), RDF, SPARQL, etc.
  • Experience with this so far has been rewarding.... let me show you

Query multiple graphs mediated by vocabulary

  • Two separate graphs (schema) (Sample inventory from ESO and CDEX)
  • Two separate vocabularies mediated by a common taxonomy
  • Issue SPARQL commands to pull resources from both graphs based on this taxonomy
Demo.... ci.sparql
GE 1/2

Query and transform

  • SPARQL to result set (XML)
  • XSL to new serialization (KML)
Let's get to it...
GE 2/2

Query and transform

Exhibit 1/3

3rd party UI (MIT Exhibit)


  • Can the approach adapt to a 3rd party web app UI
  • Address issue of RDF (XML) to JSON
  • Faceted browsing

Took steps

  • Made SPARQL Query
  • JSON bent pipe
  • Javascript to tramsform JSON -> JSON
Exhibit 2/3

3rd party UI (MIT Exhibit)

Exhibit 3/3

3rd party UI (MIT Exhibit)

Codices 1/1

Build out the links

  • It's linked data (we sometimes get the data and not the links)
  • Codices (link on LSH, timescale, etc. )
  • owl:sameas, skos:related, skos:* pvr:* (?)
Let's get see it

Use the pattern to carry this

  • Provenance vocab (implementation of OPM)
  • Incorporate into the web, sparql and data dump access methods
  • Relate to voiD (and also other metadata like DC, etc)

Metadata and archiving

  • Can we build out the ISO metadata (ISO 19115-2)
  • Can we automate our archive snapshots (and enhance with other info from these graphs)

Leverage Web Arch for Discovery

  • robots.txt
  • sitemap.xml
  • WADL, voiD, Semantic Site Map


  • Thanks and questions
  • Spy on me at: