iNikkz
iNikkz

Reputation: 3819

Timeout error in SPARQL Dbpedia query, more fields are in used?

I am using Dbpedia sparql and trying to retrieve list of persons with details.

SPARQL Query (Not working):

    SELECT DISTINCT ?dbpedia_link ?freebase_link str(?abstract) as ?abstract str(?activeYearsStartYear) as ?activeYearsStartYear str(?alias) as ?alias
                    str(?birthDate) as ?birthDate str(?birthName) as ?birthName str(?birthPlace) as ?birthPlace str(?children) as ?children
                    str(?label) as ?label str(?occupation) as ?occupation str(?otherNames) as ?otherNames str(?residence) as ?residence
                    str(?shortDescription) as ?shortDescription str(?spouse) as ?spouse str(?description) as ?description str(?subject) as ?subject
                    str(?comment) as ?comment str(?almaMater) as ?almaMater str(?award) as ?award str(?education) as ?education str(?knownFor) as ?knownFor
                    str(?networth) as ?networth str(?parents) as ?parents str(?salary) as ?salary str(?viafId) as ?viafId str(?wikiPageID) as ?wikiPageID
                    str(?wikiPageRevisionID) as ?wikiPageRevisionID  WHERE {
                {
                    ?dbpedia_link rdf:type dbpedia-owl:Person
                }
                OPTIONAL {?dbpedia_link dbpedia-owl:abstract ?abstract. }
                OPTIONAL {?dbpedia_link dbpedia-owl:activeYearsStartYear ?activeYearsStartYear .} 
                OPTIONAL {?dbpedia_link dbpedia-owl:alias ?alias .} 
                OPTIONAL {?dbpedia_link dbpprop:birthDate ?birthDate .} 
                OPTIONAL {?dbpedia_link dbpprop:birthName ?birthName .} 
                OPTIONAL {?dbpedia_link dbpprop:birthPlace ?birthPlace .} 
                OPTIONAL {?dbpedia_link dbpprop:children ?children .} 
                OPTIONAL {?dbpedia_link rdfs:label ?label .} 
                OPTIONAL {?dbpedia_link dbpprop:occupation ?occupation .} 
                OPTIONAL {?dbpedia_link dbpprop:otherNames ?otherNames .} 
                OPTIONAL {?dbpedia_link dbpprop:residence ?residence .} 
                OPTIONAL {?dbpedia_link dbpprop:shortDescription ?shortDescription .} 
                OPTIONAL {?dbpedia_link dbpprop:spouse ?spouse .} 
                OPTIONAL {?dbpedia_link dc:description ?description .} 
                OPTIONAL {?dbpedia_link dcterms:subject ?subject .} 
                OPTIONAL {?dbpedia_link rdfs:comment ?comment .} 
                OPTIONAL {?dbpedia_link dbpprop:almaMater ?almaMater .} 
                OPTIONAL {?dbpedia_link dbpprop:awards ?award .}  
                OPTIONAL {?dbpedia_link dbpprop:education ?education .}  
                OPTIONAL {?dbpedia_link dbpprop:knownFor ?knownFor .}  
                OPTIONAL {?dbpedia_link dbpprop:networth ?networth .}  
                OPTIONAL {?dbpedia_link dbpprop:parents ?parents .}  
                OPTIONAL {?dbpedia_link dbpprop:salary ?salary .}  
                OPTIONAL {?dbpedia_link dbpedia-owl:viafId ?viafId .}  
                OPTIONAL {?dbpedia_link dbpedia-owl:wikiPageID ?wikiPageID .}  
                OPTIONAL {?dbpedia_link dbpedia-owl:wikiPageRevisionID ?wikiPageRevisionID .}  
                OPTIONAL {?dbpedia_link owl:sameAs ?freebase_link
                FILTER regex(?freebase_link, "^http://rdf.freebase.com") .}
                OPTIONAL {?dbpedia_link dcterms:subject ?sub .}
            }LIMIT 2 Offset 5

I have set the limit to 2 and offset to 5. It gives timeout error. Don't know why?

But when I removed half of fields + OPTIONAL statement from query then it give results. And works fine

SPARQL query (working):

SELECT DISTINCT ?dbpedia_link str(?abstract) as ?abstract str(?activeYearsStartYear) as ?activeYearsStartYear str(?alias) as ?alias
                str(?birthDate) as ?birthDate str(?birthName) as ?birthName str(?birthPlace) as ?birthPlace str(?children) as ?children
                str(?label) as ?label str(?occupation) as ?occupation str(?otherNames) as ?otherNames str(?residence) as ?residence
                WHERE {
            {
                ?dbpedia_link rdf:type dbpedia-owl:Person
            }
            OPTIONAL {?dbpedia_link dbpedia-owl:abstract ?abstract. }
            OPTIONAL {?dbpedia_link dbpedia-owl:activeYearsStartYear ?activeYearsStartYear .} 
            OPTIONAL {?dbpedia_link dbpedia-owl:alias ?alias .} 
            OPTIONAL {?dbpedia_link dbpprop:birthDate ?birthDate .} 
            OPTIONAL {?dbpedia_link dbpprop:birthName ?birthName .} 
            OPTIONAL {?dbpedia_link dbpprop:birthPlace ?birthPlace .} 
            OPTIONAL {?dbpedia_link dbpprop:children ?children .} 
            OPTIONAL {?dbpedia_link rdfs:label ?label .} 
            OPTIONAL {?dbpedia_link dbpprop:occupation ?occupation .} 
            OPTIONAL {?dbpedia_link dbpprop:otherNames ?otherNames .} 
            OPTIONAL {?dbpedia_link dbpprop:residence ?residence .} 
        }LIMIT 2 offset 5

But don't know why it is not working with all fields.

Is there any limitation of fields in Dbpedia SPARQL?

Upvotes: 1

Views: 224

Answers (2)

Joshua Taylor
Joshua Taylor

Reputation: 85813

With that many variables, all of which are optional, it seems like you're already going to need to do some post processing of the results. As such, I'd suggest that you actually just start asking for persons, and for any property that's in that list of properties, via values. E.g.:

select distinct ?s ?p ?o {
  values ?p { dbpedia-owl:abstract 
              dbpedia-owl:abstract
              dbpedia-owl:activeYearsStartYear
              dbpedia-owl:alias
              dbpprop:birthDate
              dbpprop:birthName
              dbpprop:birthPlace
              dbpprop:children
              rdfs:label
              dbpprop:occupation
              dbpprop:otherNames
              dbpprop:residence }
  ?s a dbpedia-owl:Person ; ?p ?o .
}
order by ?s ?p
limit 100
offset 50

SPARQL results

That has a lot more rows, since it's got one per property, but it doesn't timeout. By ordering by ?s and then by ?p, the rows end up grouped by person, and with the properties in predictable order, so post processing shouldn't be all that hard. In fact, you could even use optional here, so that you'd always have the same number of rows per person, which would make it very easy (but I haven't tested this):

select ?s ?p ?o {
  values ?p { #-- ...
            }
  ?s a dbpedia-owl:Person .
  optional { ?s ?p ?o }
}
order by ?s ?p

Upvotes: 1

Jörn Hees
Jörn Hees

Reputation: 3428

it's both, a limitation and a feature...

If you run your first query on http://dbpedia.org/sparql and read the reply it should say

Virtuoso 42000 Error The estimated execution time 4626142 (sec) exceeds the limit of 240 (sec).

This essentially tells you that your query is pretty complex. The query planner estimated that it would need 4626142 seconds (~54 days) to run your query. As DBpedia is a free best effort service, they don't run such queries to be able to provide a good service for as many people as possible.

As you realized, your query gets a lot less complicated by providing less OPTIONAL clauses. You might be unaware of the fact that you're asking for a cross-join (cartesian product) of all the fulfilling values for variables in all optional clauses. There are a lot less value combinations if you bind less variables.

If you're just interested in one value per variable you might want to have a look at the SAMPLE keyword.

Upvotes: 2

Related Questions