Nicolas Raoul
Nicolas Raoul

Reputation: 60193

How to check for a sub-property at all levels expanded from a SPARQL * wildcard?

In Wikidata, I want to find an item's country. Either directly if the item has a country directly, or by climbing up the P131s (located in the administrative territorial entity) until I find a country. Here is the query:

?item wdt:P131*/wdt:P17 ?country.

The query above works fine... except when a sub-division used to belong to another country, like for Q25270 (Prishtina). In such case, the result can be anachronistic. That's what I want to fix.

Great news: in such cases we should only consider the unique P131 (located in the administrative territorial entity) that has no P582 (end time) sub-property attached to it, and the problem is solved!

My question: how to alter my query above to achieve that?

Example: Let's say MyItem is in MyStreet is in MyTown is in MyRegion is in MyCountry, I must make sure that MyStreet, MyTown, and MyRegion do not have a P582 (end time).

enter image description here

(If "sub-property" is not the correct term, please let me know the right term and I will fix the question, thanks!)

An attempt

The query below works in most cases, but unfortunately it has a bug: It finds the wrong country in cases where the current country was also the country in the past (for instance Alsace belonged to France until 1871 then to Germany and currently to France again).

SELECT DISTINCT ?country WHERE {
  wd:Q6556803 wdt:P131* ?area .
  ?area wdt:P17 ?country .
  OPTIONAL {
    wd:Q6556803 wdt:P131*/p:P131 [
      pq:P582 ?endTime; ps:P131/wdt:P131* ?area
    ] .
  } .
  FILTER( !BOUND( ?endTime ) ) .
}

Upvotes: 3

Views: 664

Answers (1)

evsheino
evsheino

Reputation: 2277

Wikidata uses different properties for direct links and links with extra information. So, for the statement "Prishtina is located in the administrative territorial entity Socialist Autonomous Province of Kosovo", there's the simple triple:

wd:Q25270 wdt:P131 wd:Q646035

And the long form with additional information (the end time):

wd:Q25270 p:P131 wds:Q25270-7df79cec-4938-8b6d-4e11-4dde6f72d73b .

wds:Q25270-7df79cec-4938-8b6d-4e11-4dde6f72d73b ps:P131 wd:Q646035 ;
    pq:P582 "1990-01-01T00:00:00Z"

So, we need to filter out all paths with an end time (pq:582):

SELECT DISTINCT ?s ?sLabel ?country ?countryLabel {
  VALUES ?s {
    wd:Q25270 
  }
  ?s wdt:P131* ?area .
  ?area wdt:P17 ?country .
  FILTER NOT EXISTS {
    ?s p:P131/(ps:P131/p:P131)* ?statement .
    ?statement ps:P131 ?area .
    ?s p:P131/(ps:P131/p:P131)* ?intermediateStatement .
    ?intermediateStatement (ps:P131/p:P131)* ?statement .
    ?intermediateStatement pq:P582 ?endTime .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
limit 50

Here, ?intermediateStatement is a statement with an end time on the path from ?s to a country.

This query does seem to time out if there is more than one value set for ?s. Also, the query does not take into account that there might exist multiple links from an item to an area where one has a timestamp and the other doesn't (both paths will be filtered out).

Upvotes: 1

Related Questions