Reputation: 1531
I know the following SPARQL against Wikidata SPARQL Endpoint query is senseless. A similar query is automatically generated from within my application. Please disregard the conceptual soundness, and let's dig into this strange (for me at least) thing happening.
SELECT ?year1 ?year_labelTemp
WHERE
{
?year1 <http://www.w3.org/2000/01/rdf-schema#label> ?year_labelTemp .
{ SELECT distinct ?year1
WHERE
{ ?film <http://www.wikidata.org/prop/direct/P577> ?date ;
<http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q11424>
BIND(year(?date) AS ?year1)
}
}
}
limit 10
According to query evaluation in SPARQL, the subquery is evaluated first, and its results are then projected out to the containing query. Consequently, this subquery will be evaluated first.
SELECT distinct ?year1
WHERE
{ ?film <http://www.wikidata.org/prop/direct/P577> ?date ;
<http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q11424>
BIND(year(?date) AS ?year1)
}
The subquery gives exactly the results expected (130 different years). Then, the results of this subquery (?year1
variable) will be projected out and joined with the triple pattern in the outer select.
?year1 <http://www.w3.org/2000/01/rdf-schema#label> ?year_labelTemp .
However, as the outer select shouldn't have any data (no labels for ?year1
), the join will give no results.
Surprisingly (at least for me), executing the whole query ()stated first gives results, and the results are weird.
wd:Q43576 Mië
wd:Q221 Masèdonia
wd:Q221 Республикэу Македоние
wd:Q221 Republiek van Masedonië
wd:Q212 Украина
wd:Q212 Ukraina
wd:Q212 Украинэ
wd:Q212 Oekraïne
wd:Q207 George W. Bush
wd:Q207 George W. Bush
What am I missing?
Upvotes: 7
Views: 2224
Reputation: 307
You wrote that the subquery gave the exact expected result, but I think you missed one value! There are films with empty unknown value as publication data, for example Q18844655 (at least when I'm writing this). It was this empty value that resulted in the seemingly random objects being found.
If you change your inner SELECT by adding for example FILTER(datatype(?date) = xsd:dateTime).
you will only get actual dates and therefore only actual years, which means one value less than without the filter. Try it here!
(When this corrected inner SELECT is used the whole thing then timeouts. The labelling really doesn't like odd values like these, it seems.)
Upvotes: 0
Reputation: 11479
The problem is that sometimes BIND
does not project variables correctly.
You can check this with the following query:
SELECT ?year1 ?year_labelTemp ?projected
WHERE
{
?year1 rdfs:label ?year_labelTemp .
hint:Prior hint:runLast true .
{ SELECT DISTINCT ?year1
WHERE
{ ?film wdt:P577 ?date ;
wdt:P31 wd:Q11424
BIND(year(?date) AS ?year1)
hint:SubQuery hint:runOnce true
}
}
BIND(bound(?year1) AS ?projected)
}
LIMIT 10
Fortunately, the following trick helps:
SELECT ?year1 ?year_labelTemp
WHERE
{
?year1 rdfs:label ?year_labelTemp .
hint:Prior hint:runLast true .
{ SELECT DISTINCT ?year1
WHERE
{ ?film wdt:P577 ?date ;
wdt:P31 wd:Q11424
BIND(year(?date) AS ?year1)
FILTER (?year1 > 0)
}
}
}
LIMIT 10
The bug can be reproduced without nested subqueries and with hint:Query hint:optimizer "None"
, thus it should be not a query optimizer bug. But it's interesting that the bug disappears after replacing wd:Q11424
with wd:Q24862
.
BLZG-963 seems to be the most related issue (as you can see, built-in functions are involved too).
Upvotes: 2