singha
singha

Reputation: 11

Virtuoso 42000 Error The estimated execution time

Using DBpedia-Live SPARQL endpoint http://dbpedia-live.openlinksw.com/sparql, I am trying to count the total number of triples associated with the instances of type owl:Thing. As the count is really big, an exception is being thrown "Virtuoso 42000 Error The estimated execution time". To get rid of this I tried to use subselect, limit, and offset in the query. However, when the offset is greater than equal to the limit, the solution isn't working and the same exception is being thrown again (Virtuoso 42000 Error), can anyone please identify the problem with my query? Or suggest a workaround? Provided is the query I was trying:

select count(?s) as ?count
where
{
?s ?p ?o
  {
      select ?s
      where
      {
          ?s rdf:type owl:Thing.
      }
    limit 10000
    offset 10000
  }
}

Upvotes: 0

Views: 993

Answers (1)

TallTed
TallTed

Reputation: 9444

Your solution starts with patience. Virtuoso's Anytime Query feature returns some results when a timeout strikes, and keeps running the query in the background -- so if you come back later, you'll typically get more solutions, up to the complete result set.

I had to guess at your original query, since you only posted the piecemeal one you were trying to use --

select ( count(?s) as ?count )
where
{
          ?s rdf:type owl:Thing.
}

I got 3,923,114 within a few seconds, without hitting any timeout. I had set a timeout of 3000000 milliseconds (= 3000 seconds = 50 minutes) on the form -- in contrast to the endpoint's default timeout of 30000 milliseconds (= 30 seconds) -- but clearly hit neither of these, nor the endpoint's server-side configured timeout.

I think you already understand this, but please do note that this count is a moving target, and will change regularly as the DBpedia-Live content continues to be updated from the Wikipedia firehose.


Your divide-and-conquer effort has a significant issue. Note that without an ORDER BY clause in combination with your LIMIT/OFFSET clauses, you may find that some solutions (in this case, some values of ?s) repeat and/or some solutions never appear in a final aggregation that combines all those partial results.

Also, as you are trying to count triples, you should probably do a count(*) instead of count (?s). If nothing else, this helps readers of the query understand what you're doing.


Toward being able to adjust such execution time limits as your query is hitting -- the easiest way would be to instantiate your own mirror via the the DBpedia-Live AMI; unfortunately, this is not currently available for new customers, for a number of reasons. (Existing customers may continue to use their AMIs.) We will likely revive this at some point, but the timing is indefinite; you could open a Support Case to register your interest, and be notified when the AMI is made available for new users.


Toward an ultimate solution... There may be better ways to get to your actual end goal than those you're currently working on. You might consider asking on the DBpedia mailing list or the OpenLink Community Forum.

Upvotes: 1

Related Questions