Every day, across England and Wales, councils plan new roads and new towns, change where you can park and how much it costs you, grant planning permission for plants that might have toxic emissions, and make many other decisions that may affect the daily lives of many people. By law, people affected by these decisions can appeal against them, and any such proposals must, by law, be advertised through notices in the London Gazette.
“Published by Authority” since 1665, the London Gazette is the UK Government’s Official Journal and Newspaper of Record. It was set up to provide King Charles II with authoritative news while he and his court were in Oxford avoiding the Great Plague, and it has been in operation ever since. A new issue is published every working day by The Stationery Office (TSO) under contract to the Office of Public Sector Information (OPSI), the part of government responsible for the Gazettes. Each edition contains in the region of 300-500 official notices. It is a uniquely authoritative record of what is happening in the country.
The Gazette is a core part of the nation’s “information infrastructure”: as essential to society’s operation as the road network and the national grid. But the information held in the Gazette is currently mostly locked up in human-readable prose, in a paper version and as PDFs and (for modern notices) HTML on the Gazette website at http://www.london-gazette.gov.uk/.
In 2007 the government commissioned Steinberg & Mayo to conduct the independent Power of Information Review, to advise how best to respond to current trends on the web, such as social networking and data mashing. A key point was recognition that a government website is not always the most effective place to provide information to the citizen. Better that the information is where the users are � which means its reuse by others on the web. For example, food hygiene inspection reports have greater impact on restaurant review websites than when they’re hidden on the local Council website.
Into this environment steps The London Gazette, which provides an ideal way to publish semantically enabled official information for reuse, because of its reliability and status. SemWebbing the Gazette lays the foundation for the development of a new official publishing strategy by the government; anytime legislation says that information must be published in the London Gazette, it will be in effect ensuring that information is made publicly available, in a consistent way and in a reusable form. This is an exciting new role for the grand old London Gazette.
To realise this vision, the challenge is to add semantics to the information in the London Gazette in a way that facilitates its potential reuse, allowing others to go on and construct useful web applications using Gazettes data.
We chose to do that using RDFa because:
Our experience is that adding RDFa to an existing website is not as straight-forward as might be hoped. The particular difficulties that we face are:
These issues, and the methods we used to overcome them, will be explored in more detail in our paper. We will also describe the ontology that we’re using and illustrate how we envision developers using the data exposed in the Gazette to create innovative web applications.
Jeni Tennison is an independent consultant currently contracted to TSO. She specialises in XSLT and XML schema development with forays into AJAX and RDF. She trained as a knowledge engineer, gaining a PhD in collaborative ontology development, and since becoming a consultant has worked in a wide variety of areas, including journal publishing, medieval manuscripts, legislation and financial services. She is author of several books including “Beginning XSLT 2.0” (Apress, 2005).
Jeni was an invited expert on the W3C’s XSL Working Group during the development of XSLT 2.0 and was one of the founders of the EXSLT initiative to standardise extensions to XSLT and XPath. She is currently working on the XProc pipeline definition language as an invited expert on the W3C’s XML Processing Working Group, on the Layered Markup and aNnotation Language (LMNL), and on the DataType Library Language (DTLL).
John Sheridan is Head of e-Services in the Information Policy and Services Directorate of The National Archives.