SPARQL Unbound Variables

Question

I need help understanding exactly what this sparql query does (asks for):

SELECT ?subject ?object
    WHERE { ?subject onto:personName ?object . ?w ?q ?s}

This is from an ontology of people. The first part is easy to understand; some unknown subject must have a personName with some unknown value, but why is the second part relevant?

Here is what i need help understanding: when there are two patterns (before and after the ".") does this mean the nodes must match both the first AND the second pattern? If so, the second pattern in this case seams to say: "some unknown subject must have some unknown predicate and some unknwon object", which would return all the tripples in the RDF graph... so in this case, the second pattern would match on all tripples, but the result would then be restricted by the first pattern..

My question is, why the second pattern then matters at all, as it would seem to be equivalent to just the first pattern. But when I run it, I get "all" the tripples in the ontology, so I need to understand exactly what the query does.

Also: what would happen if I replaced ?w with ?object, thereby binding ?object to boh patterns, and so it would seem like: ?object must have some predicate ?q and some object ?s, and so if I replaced ?w with ?object, it would find the transitive object (?s) of ?object in pattern 1... in other words: ?subject --> onto:personName --> ?object --> ?q --> ?s.

Jeen Broekstra · Accepted Answer

Here is what i need help understanding: when there are two patterns (before and after the ".") does this mean the nodes must match both the first AND the second pattern?

Yes.

If so, the second pattern in this case seams to say: "some unknown subject must have some unknown predicate and some unknwon object", which would return all the tripples in the RDF graph...

Correct.

so in this case, the second pattern would match on all tripples, but the result would then be restricted by the first pattern..

Actually, the result of the second pattern is not restricted by the first pattern at all - since the patterns share no variables. The reason you don't get back all the triples in your data set is simply that in your SELECT clause, you only specify variables from the first pattern. If you were to say SELECT * or SELECT ?subject ?object ?w ?q ?s instead you would get everything back.

My question is, why the second pattern then matters at all, as it would seem to be equivalent to just the first pattern.

It's not equivalent, because the second pattern matches all possible triples, while the first pattern only matches those triples that have onto:personName as their predicate.

And while none of the values found for ?w, ?q, or ?s show up in your query result, the inclusion of this pattern does have an impact on your result. By specifying two patterns as you do, you are effectively expressing a Join between two patterns. Since the two patterns share no variables, there is no "join condition", so to speak. The upshot of this is that the result of your query will be the Carthesian product of all triples matching the first pattern and all triples matching the second.

So what you will see in the query result is that you will get many duplicate rows for ?subject and ?object: each pair will be repeated N times, where N is the number of triples matching the second pattern.

So in conclusion, unless you have a very exotic use case, the second pattern is worse than useless as it not only does not produce any useful data in the result, but ensures that the result contains a large number of duplicates (and likely also causes your query execution time to be much higher than really necessary).

Also: what would happen if I replaced ?w with ?object, thereby binding ?object to boh patterns, and so it would seem like: ?object must have some predicate ?q and some object ?s, and so if I replaced ?w with ?object, it would find the transitive object (?s) of ?object in pattern 1... in other words: ?subject --> onto:personName --> ?object --> ?q --> ?s.

Indeed. If you did this, you would be have a shared variable and therefore a join condition between the two patterns. The effect on the query result would be that you now no longer receive all ?subject ?object pairs for which a onto:personName predicate exists, but only those for which it is also true that the ?object value occurs as a the subject of some other triple somewhere in the dataset.

SPARQL Unbound Variables

Answers (1)

Related Questions