RDangol
RDangol

Reputation: 189

SPARQL query - why is fetching distinct predicates so slow?

While I was exploring some SPARQL queries I noticed that fetching distinct predicates is extremely slow but no such issues while fetching subjects or objects.

I tested it with linkedgeodata and I ran the following queries at linkedgeodata's endpoint (SERVICE command not used in this case for obvious reasons), SPARQL playground and Apache Jena Fuseki server. The behavior was same. Can anyone help me understand the reason behind it?

#selecting distinct subjects. Executes fast
SELECT * WHERE {
 SERVICE <http://linkedgeodata.org/sparql> {
 select distinct ?s
    where{
    ?s ?p ?o .        
    } limit 100
 }  
}

#selecting distinct predicates. VERY SLOW
SELECT * WHERE {
 SERVICE <http://linkedgeodata.org/sparql> {
 select distinct ?p
    where{
    ?s ?p ?o .        
    } limit 100
 }  
}

Upvotes: 3

Views: 717

Answers (1)

TallTed
TallTed

Reputation: 9444

Answered in comments by @AKSW; rephrased a bit here --

Usually, the schema of a dataset comprises many fewer triples than hold the instance data; i.e., there are some properties and classes, but many more triples that use each of those classes and properties.

Your query has to iterate over the triples in the dataset until enough predicates have been found (i.e., until the LIMIT is reached). This can even result in scanning the whole dataset if there are fewer predicates than your LIMIT (fewer than 100, here).

LinkedGeoData has a fairly small number of properties (~1,805; see query text and live result [takes approximately 3 minutes]) and a fairly large number of triples (~1,384,887,592; see query text and live result [takes approximately 1 minute]), thus, your second query will be much slower.

A predicate index would certainly speed up this query; it's just not a default index in Virtuoso databases, because it wouldn't provide much benefit in most common scenarios (which this query is not). We discuss our default "3+2" indexing scheme, and how to add some additional sometimes-valuable indexes, in the documentation.

Upvotes: 2

Related Questions