Increasing content usage through semantic enrichment: the pub2web Metastore
In 2006, Hewlett Packard granted Publishing Technology the Best Applications Paper award for our presentation of the technology behind our Metastore, an RDF triple store which was implemented using Jena, the open source Java framework developed by the HP Labs Semantic Web Program.
The Metastore serves as the content backbone of pub2web. As an RDF triple store (Resource Description Framework), it is a purpose-built database for the storage and retrieval of metadata in triples, data entities composed of subject-predicate-object groupings—like “Asthma is a disease” or “Albuterol treats asthma.” This RDF-based semantic architecture uniquely allows pub2web to store and understand content at a highly granular level.
The Metastore is data agnostic meaning it can handle all content formats including journals, books, images, multi-media, magazines, conference proceedings, reference works, blogs, reviews and again support the most granular concepts such as a taxonomy item, author name or a snippet of text. Matched against a subject taxonomy, semantic tagging can be mined to generate concept homepages, related content suggestions, autosuggest functionality, tag clouds, starburst visualizations and concept bar graphs showing relevancy to each article, chapter or search result.
The outcome of a Metastore-driven semantic platform is content that is highly discoverable, increasing traffic and site “stickiness.” At the same time, publishers gain the flexibility to sell fragments, create bundles and experiment with new business models across all of their assets.