Saturday, January 22, 2005

Data interoperability, if you think you have problems...

A large part our work at Simualcra involves trying to establish data interoperability standards across organisations, often involving both the public and private sectors. Most of our work over the last five years has been in the UK education environment, attempting to get the various parties involved in ICT in education working together better. We have helped to create various metadata standards in the UK (both completely new and also creating profiles of international standards), we have helped to develop standard vocabularies and taxonomies for describing education learning materials and we have built one of the core educational websites.

It is easy for us to look at our work and admire how much we have achieved, and it is a lot. But most of the problems we have faced have been of a political nature, the problems of persuading competing companies and different government organisations to talk to each other. The technical challenges have not been so great, not tiny but also not massive. At least they are not massive when you compare them against the work of others in different fields.

If you have read my last post you will know that I am very interested in all things space and astronomy related. Being such a space nut, and a developer involved a lot in data interoperability standards and the practical technology to make this happen, you can understand my interest in the Virtual Observatory concept, well you will once I explain it.

Astronomy used be just people around the world using telescopes and writing to each other about what they have seen, with perhaps a sketch or two. As time went on information was shared more freely in journals and at regular international meetings. Then the use of photography allowed the recording of these observations and these photos could be reprinted. Finally we move into the digital age and CCD images can be shared around the world via the Internet almost as soon as they have been taken.

As well as the more traditional ground based optical telescopes we now have space based observatories (such as Hubble) and people are observing in wavelengths across the complete spectrum. This creates a hell of a lot of data.

While all observations of the heavens have a specific purpose it is reasonable to expect that the images and data produced can be used in different future research. A clear example of this is verifying the orbit of a newly discovered asteroid such as the recent 2004-MN4 which was initially thought to have a chance of hitting earth in 2029. Its orbit was correct when an image of it was found on an old observation which was not intended for asteroid hunting purposes.

This reuse of observation data and also the fact that there is so much of it streaming in constantly from all over the world, and from orbit, led people to suggest the Virtual Observatory concept. This is quite simple really, the idea is that standards for search observatory archives and exchanging data are created. Along with the use of Grid Computing techniques, this could lead to much more automation in the reuse of this data. For example it would be relatively easy to produce image recognition programs to search the worlds observation archives for asteroids or comets (i.e. things in the image that shouldn't be there). This of course is only the tip of the iceberg of possible usages.

While I said the concept is quite simple, the implementation of it isn't. The effort is being led through the International Virtual Observatory Alliance (IVOA), who are bring people from around the world together to help create the required standards. These are complex metadata schemes and interoperability standards that are being produced, you only have to look at a small subset of these to see that. Try the Space-Time Coordinate Metadata standard for example.

Beyond this there are standards or proposals for;
  • Observation information (location, telescope information etc)
  • Tabular Data (the VOTable spec allows interoperability of tabular data)
  • Datatype (for columns in the above tabular spec)
  • Resource descriptions
  • Archive search queries
  • Image retrieval from archives
  • Use Cases for all of this
  • Global VO architecture
The list goes on and on and none of these are exactly simple tasks.

It isn't healthy to revel in the problems of others, and I don't believe that that is what I am doing, as I really hope that they succeed. However it is really nice to know that there are others in your field with tasks of scales that scare the crap out of you.

Sometimes the grass is greener on your side of the fence...