People in cafeJean Paoli
speakingAmsterdam rooftopsXTech delegats
XTech 2008: “The Web on the Move”6-9 May 2008, Dublin, Ireland
Your account


(?)
XTech 2008 news

Subscribe to receive news about XTech

Partners

Organized by
Co-hosted by

Sponsors

Conference Chair

Event software by Expectnation
Add to your personal schedule

Representing, indexing and mining scientific data using XML and RDF: Golem and CrystalEye

Andrew Walkingshaw (University of Cambridge)
Open data Goldsmiths 2
Chair: Mark Birbeck (webBackplane, W3C Invited Expert)

Modern science produces a lot of data; whether outputted from experimental apparatus or as the result of simulation, the volume of information a scientist can produce is now often radically greater than it was even ten years ago. It follows that there is now a need for new tools to enable scientists to filter, mine and search both their own data and that produced by other researchers.

One of the major sources of experimental data is in the “supplementary data” attached to publications in journals. Our CrystalEye repository exploits this by aggregating, and converting to CML , the supplementary data from journals which publish crystal structures.

The question then becomes how to add value to this data. One approach is by enhancing the searchability and discoverability of the data – a task for which RDF in general, and SPARQL in particular, is well-suited. We therefore, using our Golem ontology language and pyGolem toolkit (which enable the layering of richer semantics onto CML), extract metadata from CrystalEye as RDF, and use it to build new interfaces to the repository – thus making the data therein easier to find, analyse and reuse.

Photo of Andrew Walkingshaw

Andrew Walkingshaw

University of Cambridge

I’m a researcher in the Unilever Centre, part of the Chemical Laboratory at the University of Cambridge, working on building systems and languages for the representation and mining of large volumes of chemical data. Before that, I worked in theoretical chemical physics, designing algorithms for the prediction of diffusion and chemical reactions by atomistic simulation.

I’m part of the MaterialsGrid project, within which my major research interest is the Golem ontology language/toolkit. I blog at Brighten the Corners.