Latitia
Latitia

Reputation: 31

How can I optimize my recursive SPARQL query?

I'm trying to extract buildings from Wikidata using a recursive SPARQL query but I keep getting query timeouts. Is there a way to circumvent this?

This is my current query, selecting all buildings with either a Freebase ID or a Google Knowledge Graph ID, and a Dutch label:

SELECT DISTINCT ?building ?buildingLabel
WHERE {
  ?building p:P2671|p:P646 ?id;
            p:P31/ps:P31/wdt:P279* wd:Q41176;
            rdfs:label ?buildingLabel .
  FILTER(LANG(?buildingLabel) = 'nl') .
  FILTER (?building != ?buildingLabel) .
}

I've tried manually looking a few layers deep instead but, for some reason, I get no results for three or more layers deep even though those definitely exist. I've tried this using:

SELECT ?building
WHERE {
 ?building p:P31/ps:P31/wdt:P279 [p:P31/ps:P31/wdt:P279 [p:P31/ps:P31/wdt:P279 wd:Q41176]].
}

and using

SELECT ?building
WHERE {
 ?parent2 p:P31/ps:P31/wdt:P279 wd:Q41176.
 ?parent1 p:P31/ps:P31/wdt:P279 ?parent2.
 ?building p:P31/ps:P31/wdt:P279 ?parent1.
}

There are about 2.24 million buildings and about 18 million entities with either a Freebase ID or a Google Knowledge Graph ID on Wikidata. I've looked at this guide but couldn't quite figure out how to apply it to my query. I've also read the answer to this question but, unfortunately, using multiple queries isn't really an option for me.

Upvotes: 3

Views: 152

Answers (1)

Gregory Williams
Gregory Williams

Reputation: 466

If your intention is to use the "recursive" property path to find things of type building and also types that are subclasses of buildings, your first query using wdt:P279* is right, while the later attempts at repeating the full p:P31/ps:P31/wdt:P279 pattern won't match any data.

By simplifying the first query a bit I was able to get this to run (returning 96,297 results in 39s):

SELECT DISTINCT ?building ?buildingLabel
WHERE {
  ?building p:P2671|p:P646 ?id;
            wdt:P31/wdt:P279* wd:Q41176 .
  ?building rdfs:label ?buildingLabel .
  FILTER(LANGMATCHES(LANG(?buildingLabel), "nl"))
}

Two notable changes:

  • p:P31/ps:P31 is replaced by wdt:P31, removing one join from the query.
  • The second FILTER is unnecessary, as ?building (a URI) and ?buildingLabel (a string) are necessarily going to be unequal

Upvotes: 1

Related Questions