Why is my SPARQL query so slow?

Question

I'm trying to fetch a number of queries from the database of the EU plenary debates through a SPARQL interface (Interface here, schema here). As I do that I would like to retrieve the names of the speaker, their home country and their partyname. This takes me 5 minutes to complete for each agenda item, which seems slow. Am I making any obvious mistakes in my query that is slowing it down?

SELECT ?text (SAMPLE(?speaker) AS ?speaker) (SAMPLE(?given) AS ?given) (SAMPLE(?surname) AS ?surname) (SAMPLE(?acronym) AS ?country) (SAMPLE(?partyLabel) AS ?partyLabel) (SAMPLE(?type) AS ?type)
WHERE {
    dcterms:hasPart ?speech.
   ?speech lpv:speaker ?speaker.
   ?speaker foaf:givenName ?given.
   ?speaker foaf:familyName ?surname.
   ?speaker lpv:countryOfRepresentation ?country.
   ?country lpv:acronym ?acronym.
   ?speech lpv:translatedText ?text.
   ?speaker lpv:politicalFunction ?func.
   ?func lpv:institution ?institution.
   ?institution rdfs:label ?partyLabel.
   ?institution rdf:type ?type.
   FILTER(langMatches(lang(?text), "en"))
} GROUP BY ?text

Note, changing ?speech lpv:translatedText ?text. to ?speech lpv:textt ?text. reduces query time to 30 seconds.

RobV · Accepted Answer

There doesn't look to be anything particularly wrong with your SPARQL query and you have made no obvious mistakes (other than some syntax validity issues which I discuss later)

The problem appears to be that the SPARQL service you are using uses a triple store that doesn't cope with queries with large numbers of joins very well. When experimenting with your query moving the triple patterns around produced a Stack Overflow in the SPARQL service!

I would suggest downloading the data yourself from http://linkedpolitics.ops.few.vu.nl/home - there are links under point 3 of the About the Data section from which you can download the data yourself. You can then load it into the triple store of your choice and run your query against that instead.

For example I downloaded the data and put it into Apache Jena Fuseki (disclaimer - I work on the Apache Jena project) and was able to run the query almost instantaneously after I fixed the query to be proper valid SPARQL.

Making the Query valid SPARQL

The query as given is not strictly valid SPARQL so you'll need to correct it in order to run it elsewhere.

Firstly the various prefixes used are not defined by the query because the service you are using inserts them automatically, to run this query against another triple store you'll need to add the following to the start of the query:

PREFIX dcterms:  
PREFIX foaf:  
PREFIX lpv:  
PREFIX rdf:  
PREFIX rdfs:

It is also not legal to perform a variable assignment where the variable name given is already in scope e.g. (SAMPLE(?speaker) AS ?speaker) so those need to change:

(SAMPLE(?speaker) AS ?speaker1)

Which results in the following valid and portable SPARQL query:

PREFIX dcterms:  
PREFIX foaf:  
PREFIX lpv:  
PREFIX rdf:  
PREFIX rdfs:  

SELECT ?text (SAMPLE(?speaker) AS ?speaker1) (SAMPLE(?given) AS ?given1) (SAMPLE(?surname) AS ?surname1) (SAMPLE(?acronym) AS ?country1) (SAMPLE(?partyLabel) AS ?partyLabel1) (SAMPLE(?type) AS ?type1)
WHERE {
    dcterms:hasPart ?speech.
   ?speech lpv:speaker ?speaker.
   ?speaker foaf:givenName ?given.
   ?speaker foaf:familyName ?surname.
   ?speaker lpv:countryOfRepresentation ?country.
   ?country lpv:acronym ?acronym.
   ?speech lpv:translatedText ?text.
   ?speaker lpv:politicalFunction ?func.
   ?func lpv:institution ?institution.
   ?institution rdfs:label ?partyLabel.
   ?institution rdf:type ?type.
   FILTER(langMatches(lang(?text), "en"))
} GROUP BY ?text

Why is my SPARQL query so slow?

Answers (1)

Making the Query valid SPARQL

Related Questions