Using Microsoft OData for data distribution

The goal is to facilitate the use of PowerPivot and PivotViewer (initially at least)for data navigation and discovery.

The Virtuoso server software in use by data.oceandrilling.org supports exposing the RDF graphs using PivotViewer and CXML. (Example of future efforts: Example Pivot Viewer).

Exposing all the data from the hosted linked data graphs via OData is of course also a goal.


Bridging data from USIO's into OData via Java

This is a simple test using the data from odata4j (Ref: http://code.google.com/p/odata4j/). The test involved taking data from a SPARQL search on the sparql end point (http://data.oceandrilling.org/sparql) and taking the resulting CSV formatted return and casting it to OData format.

The basic flow of events was as follows:

  • Data generatedin CSV format via SPARQL call below. This is also the data associated with URI: http://data.oceandrilling.org/januslod/parameter/pws_section_count/304/1309
    SELECT DISTINCT ?s ?p ?o
    FROM <http://data.oceandrilling.org/januslod#>
    WHERE {
       ?s skos:related <http://oceandrilling.org/core/1/janus/pws_section_count>  .
       ?s skos:broader <http://data.oceandrilling.org/januslod/parameter/pws_section_count/304/1309> .
       ?s ?p ?o .
  • By means of the Java code referenced below the CSV data (see input and output files below) was convered to OData
  • Resulting OData file (see input and output files below)loaded into Tableau (via HTTP) and immediately the data is actionable

The full version of Tableau is commercial but this example was done using the free version. Note that Tableau could have been replaced by Excel or even Excel + PowerPivot, Sesame or any client that consumes OData (ATOMish) data.

All of this was done by operations on models. The java code above is simply an ETL of CSV to OData. Ironically the data originally came from an RDF triple store. Arguably a richer model than OData (until OData gets its vocabulary support in). Indeed this example of a semantic ETL process of sorts is not to elevate one model above another.

Rather this example demonstrates an approach that uses model operations rather than specific service API's to deliver data to clients that can be used to investigate and explore large amount of data. Tools that prefer RDF can talk directly to the RDF store, tools that want CSV or OData should also work in the architecture.

The Janus LOD web application (Ref: http://data.oceandrilling.org/januslod/) has been modified to return OData from the content negotiation request for ATOM. Also, appending the .atom suffix to the end of the URL also will return the OData format.

So a URL like: http://data.oceandrilling.org/januslod/parameter/thermcon_count/311/1329 can have the .atom appended to it (http://data.oceandrilling.org/januslod/parameter/thermcon_count/311/1329.atom). Such a URL return OData (atom/xml) and can be fed directly into an OData consumer like Tableau (reference middle image).

The screen cast at the bottom of this page shows the sequence of navigating to a page of information in the Janus webapp and then loading the OData version of the feed into Tableau for plotting and exploring.

Other clients like Sesame, or Excel could also be used to consume directly such OData formats.

These examples never bothered to map URI's to more human readable labels which any production effort would do. So the screen capture of the Tableau interface shows some rather long and unfriendly URI's would gain nice human labels in a "real" version.

Source code modified from in memory producer example

Input and output files

Other refs: