elvaras
elvaras

Reputation: 25

Slow SparQL query when joining two sets of URIs

I am checking a SparQL query which is running too slow on my system. Very much simplified, the query goes like this:

# The whole query takes ~20 seconds
SELECT ?baseUri_s1 {

    # This takes ~1 second and returns 3000 results
    { SELECT ?baseUri_s1 {
      # Here goes some more complex business logic
      ?baseUri_s1 myOntology:hasProperty1 'myProperty1'
    } }

    # This takes ~0.1 seconds and returns 1 result
    { SELECT ?baseUri_s2 {
      # Here goes some more complex business logic
      ?baseUri_s2 myOntology:hasProperty2 'myProperty2'
    } }

    FILTER (?baseUri_s1 = ?baseUri_s2)
}

So if the two inner selects take under 1 second each... Is it possible that joining a list of 3000 URIs and another list of one URI takes over 18 seconds? Am I missing something?

Upvotes: 1

Views: 226

Answers (1)

vassil_momtchev
vassil_momtchev

Reputation: 1193

According to the SPARQL spec, each subselect will be executed independently. If the first subselect return 1'000 results and the second 300, the Cartesian product between the two datasets would be 300'000. Comparing 300'00 is likely to be much slower.

Why you don't simply execute the query as:

# The whole query takes ~20 seconds
SELECT ?baseUri_s1 {

    # Here goes some more complex business logic query 1
    ?baseUri_s myOntology:hasProperty1 'myProperty1'

    # Here goes some more complex business logic query 2
    ?baseUri_s myOntology:hasProperty2 'myProperty2'
}

Then you will eliminate the nasty Cartesian product between sub-queries without shared variables and the query optimizer may push some of the complex business logic optimizations earlier.

Upvotes: 1

Related Questions