SWIG-Uk Special Event: Alberto Reggiori & Andrea Marchesini – The BBC Content Aggregator for the Memoryshare Service

Alberto and Andrea presented some of the work they have been at Asemantics. Most notably they showed how they are using Semantic Web technologies on a developing a new generation of feed aggregators for the BBC’ MemoryShare service which is described as an archive of memories and events from around 1900 to today.

One of the key messages that Alberto tried to convey was around the adoption of RDF and the difficulties around trying to use it solve the various problems that are faced by the SemWeb community. In his opinion  RDF is …

  • Complex, because it tries to solve too many problems at once
  • Search is hard
  • Granularity management, Read/Write is hard
  • We currently have a poor software tool chain

He argues that the solution is to combine existing Web 2.0 technologies with RDF, and actually hide RDF, and instead present data in formats that are more widely accepted and entrenched, because customers don’t get The Semantic Web or RDF. I think Alberto got the biggest laugh of the day when he likened the adoption of RDF to the Resurrection and summarily pronounced on one slide that “RDF is Dead” only to have it resurrected three days later!

One of the things that Alberto and Andrea presented was some current work they are doing on an specifying and developing SPARQL to Objects (S2O) which a SPARQL Extension that maps RDF Graphs to JSON Objects. Whilst the output format seems pretty friendly I’m not convinced I like how it binds the semantic of the output to the semantics of the query – but guess there are some advantages to this approach.

I enjoyed Alberto’s talk and was able to spend a bit of time chatting to him during one of the breaks, he’s a passionate researcher with some interesting ideas. He recently did a podcast with my colleague Paul Miller as part of our Talking with Talis series, which you can listen to here.

SWIG-UK Special Event: Leigh Dodds, Facet Building Web Pages with SPARQL

You can view the slides from Leigh’s presentation here.

Leigh is CTO at Ingenta. They have built a web framework, called Facet,  for building web applications on top of RDF. In their opinion there was no good system for integrating RDF repositories with an existing web framework  in Java . Although the framework does have some limitations it seems to me that it is quite simple and perhaps even elegant.

It appears that by embracing some limitations in RDF Modelling Leigh has succeeded in building a framework that, on the face of it, provided a fairly flexible means of building web pages from an RDF Repository, and because of way it’s designed and built it lends itself to being integrated very easily into existing templating environments ( JSP, Velocity etc. ).

Leigh was asked several questions by the audience and his answers provided further insight

Question: how do you use this for searching when you get a list of results back?

Answer: Not using this for searching.

Which to me makes perfect sense each of the queries that are configured returns a sub graph, or lens that is effectively a view of data that you can pass to a templating engine for rendering.

Question: Is the schema annotation mechanism for a known data set rather than in general?

Answer: yes its application specific and configurable at application level.

Again I thought Leigh had made this clear during the presentation and therefore should have been obvious. Whilst some might consider this to a limitation, I wouldn’t necessarily view it as such.

Question: have you considered how your framework might work with Rich Clients, Ajax etc?

Answer: That’s why they support JSON output. Only currently doing basic AJAX lookups at the moment.

This is one of the features of the framework that does pique my interest, as we move more and more towards building richer client interfaces on the web there is an expectation that web frameworks and web services should support outputting data in JSON. At the moment our Platform doesn’t formally support this, but it is something we are definitely intending to do.

It makes sense to provide data back to the client in the format they need it rather than a fixed format that the application then has to process and convert. I’ve seen the problem when building desktop widgets, whilst XML is great and portable, most widget frameworks are based on ecmascript and understand JSON natively so wouldn’t it be nicer if web services would return JSON.

Anyway ldodds++ 🙂

Question: Will you open source it?

Answer: Hopefully, it will be, need to be dis-entangled but wanted to share the ideas here today so people can get a sense of the value.

I’m hoping that they do, I’d like to have a play around with the framework and possibly even extend it.

All in all I was actually pretty impressed with Leigh’s talk, I’ll be keeping an eye out for Facet.

SWIG-UK Special Event: Graham Klyne Building a Semantic Web accessible image publication repository

Graham begins by offering a little background information on why they want to be able to publish images using SemWeb technologies.

Previous approaches involved general purpose image databases based on conventional relational technology which was useful and worked but died due to licensing restrictions on the data.

With the semantic web technologies emerging a Semantic Image database was created using native RDF Storage. There was some success, but they had to develop all the heavy lifting, and fragility due to tightly coupled components. Graham commented on how this touched on Ian Davis’s talk and how useful a platform might be and how difficult it is to build one.

Their current approach is to based in Data Web Philosophy : the idea of linking available web data rather than creating new application stores. They based this on Southampton Uni’s EPrints. They use Jena and Joseki to provide a SPARQL endpoint for the metadata.

What does semantic web accessible mean?

Image metadata is accessible and queryable from multiple sources using SPARQL and images should be accessible using simple HTTP requests.

EPrints is an “OAI repository” which uses a common metadata ( e.g. Dublin Core ).

Graham goes on to talk about Global vs Local access – when accessing metadata from multiple repositories – they want to be able to get away from a single global coordinated index to more local and uncoordinated ones.

The problem with this is people will use different schemas, so over time connecting data together becomes and issue or concern. So they are looking at developing strategies to address this and over time perhaps a common schema might emerge. Currently there are two strategies for metadata conversion:

  • Meta Data Re-writing – which involves making an extra copy of the data.
  • Query Re-writing – instead of changing or copying data they change the query.

Graham believes a combination of the two is required. He goes onto describe an overview of their implementation – collect data using OAI and then use Joseki. They have had to modify the EPrints software as well as modify the database to accommodate domain metadata. What they have though is not a generic solution but they have achieved the creation of a platform ( not a semantic web platform ) within 6 weeks which is different to their previous experiences.

They are currently looking at implementing user interfaces – so they are looking at tools that can do faceting, and have a done some experimentation with JSPace and have implemented a facetted browser that uses a Joseki endpoint. They are looking at trying to use mspace, from Southampton, but haven’t been able to get a hold of the software yet. Graham goes onto show some screen shots of their User Interface.

Lessons learned :

  • Available tools do support Semantic Web accessibility, Joseki has been key to their progress.
  • Creating effective user interfaces is not easy.
  • Importance of loose coupling.

Wish list:

  • Sparql Update support to Joseki, will facilitate doing meta data updates from external hosts.
  • More generic web data publishing tool ( eg. METS )
  • Query Distribution
  • Merging data from multiple uncoordinated sources. ( FlyWeb )
  • Improving user interface will be an ongoing task

My Thoughts

What strikes me is that their requirements aren’t a million miles away at all from some of the basic services that the Platform provides. I’d be curious to see what they might be able to achieve if they had a Platform store which combines metadata and content together, or whether the same problem could be solved differently on the platform.

Questions: Can you give us some indication of the user tasks your trying to support? Graham describes the process scientists currently go through. From the lengthy description he provides the Interaction Designer in me wonders with taking a UCD design approach would be better to evolving an interface that would support them.

SWIG-UK Special Event: Ian Davis on the Talis Platform

You can view the slides for Ian’s presentation here:

Ian begins by describing the platform as a multi-tenant database with a REST based API. There are pools of content and metadata called Stores, which you can add content to and search and retrieve data and binaries from.

We want to bring the platform to as many developers as possible.

We use REST but also adopt existing protocols such as RSS this is so that we can re-use data formats and protocols where they exist, create and document where not. Any data stored in the platform is still your data.

Ian describes the API next, he talks about how you can use the API to

  • Add Content to a store using POST ( http://api.talis.com/stores/mystore/items )
  • Search Content in a store using GET ( http://api.talis.com/stores/mystore/items )
  • Adding Metadata POST RDF/XML to add RDF In Bulk ( http://api.talis.com/stores/mystore/meta ), you can also POST Change Sets which are lists of reified triples with a common subject.
  • Search Metadata using SPARQL ( http://api.talis.com/stores/mystore/services/sparql? ) this is limited to searching the metabox for a given store. Each store has a multisparql service to search multiple graphs.
  • Augmentation ( http://api.talis.com/stores/mystore/services/augment ) supply an RSS feed and augment it with additional triples. In other words take a search from one store and chain it with augmentation from another.
  • Faceting  ( http://api.talis.com/stores/mystore/services/facet ) uses indexed metadata to build facets for search terms.
  • OAI ( http://api.talis.com/stores/mystore/services/oai-pmh ) standard archiving and harvesting protocol,.
  • Snapshots – Can programmatically request a snapshot of your store. Produces a tar file accessible by HTTP, which contains all items from content box, all rdf etc.
  • Security – Coarse gained capability model, uses authentication via HTTP digest, with URI based identities.

Ian then goes onto talk about some of our future plans:

  • Relevance ranking for RDF – use relations between resources to influence ranking, as well as discover resources based on text search of their associated resources.
  • Personalisation and recommendation services – resources that are similar to X tend to have y, trails and suggestions based on usage.

Ian describes the architecture of the platform and some of technologies that it is built upon , for example Jena. Ian also talks about our goals in terms of scaling and resilience, our aim for zero downtime

Ian goes onto describe Marvin which a development project we are working on to deal with parallel data processing., the idea being that all content submitted to platform is processed in parallel.

Ian also talks about Majat, which is another development research project to that looks at Distributed storage and search .

Ian then goes on to show some examples of how the platform is currently being used by showing some of the applications we have built.

  • Talis Engage – a community information application that uses SKOS, SIOC and FOAF
  • Talis Prism – Library catalogue search
  • Project Zephyr – Academic resource/reading list management. Ian Also demo’d our relationship  browser which is embedded in Zephyr and allows users to explore data in the platform.

Question and Answers

Question: What SemWeb capabilities are customers warming to? It’s still early days.

Question: Are you doing reasoning in the platform.? Not yet.

Question: How much risk is involved in exposing SPARQL Service? Some risk, someone could write a horrible SPARQL query.

Question: Would you consider releasing this as a product and not a service? No, we are offering the platform as SaaS

Question: Can you categorise the kinds of apps this is best suited for? Any applications that are information rich.

Semantic Web Interest Group – Special UK Event

I was fortunate enough to attend Friday’s SWIG-UK Special Event hosted at HP Research Labs in Bristol. It was a wonderful day full of some very interesting talks from a pretty diverse range of speakers talking about how they are using SemWeb technologies to solve problems. Naturally we Talisians were there talking about The Talis Platform, what it is and showing some of the commercial applications we have built upon it. The work we are doing at Talis and the progress we have made in the development of our platform was received very well, which I have to say was a great feeling.

The day was also about meeting and making contacts amongst the SWIG community and from that point of view the day was a great success I got the chance to meet some very interesting individuals who are working on some amazing projects. I got the distinct impression that there was certainly a great deal of potential in the idea of letting some of these individuals try out their ideas on our Platform and that’s something that I am really excited about.

I have made notes on a number of the presentations from Friday which I will post up over the next couple of days.