Reputation: 1
I’m looking for a SPARQL query to extract all instances of a wikidata entity (e.g., countries (Q6256) and any of its subclasses, while ignoring truthiness. My goal is to get all countries (including historic ones) and filter the results later on, if necessary. I also want to use the same query structure for other entities later on (e.g., cities), by swapping out the QID.
I came up with the following query:
SELECT DISTINCT ?item ?desc WHERE {
?item p:P31/ps:P31/p:P279*/ps:P279* wd:Q6256.
?item rdfs:label ?desc filter (lang(?desc) = "en").
} ORDER BY ?desc
This gets me the direct instances (P31 = instance_of) and those of subclasses (P279 = subclass_of), while ignoring truthiness (by using 'p:/ps:' instead of 'wdt:'). One historic country that was often missing otherwise was Q16957, but this query includes it.
Unfortunately, when I ran the same query for cities (Q486972 = human settlements) I lost ⅔ of the instances compared to using 'wdt:', so something must be wrong and gets missing:
This query returns a count of 2,933,165 (Link):
SELECT DISTINCT (COUNT(?item) AS ?count) WHERE {
?item wdt:P31/wdt:P279* wd:Q486972.
}
While this returns only 981,268 (Link):
SELECT DISTINCT (COUNT(?item) AS ?count) WHERE {
?item p:P31/ps:P31/p:P279*/ps:P279* wd:Q486972.
}
So my question is, what would be the correct query/statement to get all instances of a wikidata entity including subclasses and ignoring thruthiness, without losing any potential countries/cities etc.?
Upvotes: 0
Views: 392