Reputation: 165
Suppose I want to get a list of every country (Q6256) and its most recently recorded Human Development Index (P1081) value. The Human Development Index property for the country contains a list of data points taken at different points in time, but I only care about the most recent data. This query will not work because it gets multiple results for each country (one for each Human Development Index data point):
SELECT
?country
?countryLabel
?hdi_value
?hdi_date
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country p:P1081 ?hdi_statement.
?hdi_statement ps:P1081 ?hdi_value.
?hdi_statement pq:P585 ?hdi_date.
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
I'm aware of GROUP BY/GROUP CONCAT but that will still give me every result when I'd prefer to just have one. GROUP BY/SAMPLE will also not work since SAMPLE is not guaranteed to take the most recent result.
Any help or link to a relevant example query is appreciated!
P.S. Another thing I'm confused about is why population P1082 in this query returns only one population result per country
SELECT
?country
?countryLabel
?population
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country wdt:P1082 ?population. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
while the same query but for HDI returns multiple results per country:
SELECT
?country
?countryLabel
?hdi
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country wdt:P1081 ?hdi. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
What is different about population and HDI that causes the behavior to be different? When I view the population data for each country on Wikidata I see multiple population points listed, but only one gets returned by the query.
Upvotes: 6
Views: 3140
Reputation: 11459
Both your questions are duplicates, but I'll try to add interesting facts to existing answers.
Question 1 is a duplicate of SPARQL query to get only results with the most recent date.
This technique does the trick:
FILTER NOT EXISTS {
?country p:P1081/pq:P585 ?hdi_date_ .
FILTER (?hdi_date_ > ?hdi_date)
}
However, you should add this clause outside of OPTIONAL
, it is not working inside of OPTIONAL
(and I'm not sure this is not a bug).
Question 2 is a duplicate of Some cities aren't instances of city or big city?
You can't use wdt-predicates, because missing statements are not truthy.
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best non-deprecated rank for given property. Namely, if there is a preferred statement for property P2, then only preferred statements for P2 will be considered truthy. Otherwise, all normal-rank statements are considered truthy.
The reason why P1081
always has preferred statement is that this property is processed by PreferentialBot.
Upvotes: 4