Keanu Paik
Keanu Paik

Reputation: 324

Query to get properties of quantity for a certain entity class

What I am trying to do is to get properties with quantity property types of a certain class (such as City, Country, Human, River, Region, Mountain, etc). I tried several classes like Country (wd:Q6256) works okay with the query below, but many other classes makes the query to exeed time limit. How can I achieve the result optimizing the query below? or is there any other way to get the properties of Quantity type in a certain class?

SELECT DISTINCT ?p_ ?pLabel ?pAltLabel
WHERE {
  VALUES (?class) {(wd:Q515)}
  ?x ?p_ [].
  ?x p:P31/ps:P31 ?class.

  ?p wikibase:claim ?p_.
  ?p wikibase:directClaim ?pwdt.
  ?p wikibase:propertyType ?pType.
  FILTER (?pType = wikibase:Quantity)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "ko,en". }
}

Upvotes: 2

Views: 139

Answers (1)

cygri
cygri

Reputation: 9472

Attempt 1: Optimizing the query

Some observations:

  • Instead of p:P31/ps:P31, you could use wdt:P31 which is faster by avoiding the two-property hop, but finds only the truthy statements
  • The expensive part is the call to the label service at the end, as can be seen by commenting that line out by placing # at the start of the line
  • The query retrieves every claim on every city (many!), gets the properties of the claims (few!), and only removes the duplicates in the end (with DISTINCT)
  • As a result, the label service is called many times for the same property, once per claim! This is the big problem with the query
  • This can be avoided by moving the retrieval of properties with the DISTINCT into a subquery, and calling the label service only at the end on the few properties
  • After that change it should be fast, but is still slow because the query optimiser seems to evaluate the query in the wrong order. Following hints from this page, we can turn the query optimiser off.

This works for me:

SELECT ?p ?pLabel ?pAltLabel {
  hint:Query hint:optimizer "None" .
  {
    SELECT DISTINCT ?p_ {
      VALUES ?class { wd:Q515 }
      ?x wdt:P31 ?class.
      ?x ?p_ [].
    }
  }
  ?p wikibase:claim ?p_.
  ?p wikibase:propertyType ?pType.
  FILTER (?pType = wikibase:Quantity)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Attempt 2: Splitting the task into multiple queries

Having learned that the approach above doesn't work for some of the biggest categories (like wd:Q5, “human”), I tried a different approach. This will get all results, but not in a single query. It requires sending ~25 individual queries and combining the results afterwards:

  • We start by listing the quantity properties. There are, as of today, 503 of them.
  • We want to keep only those properties that are actually used on an item of type “human”.
  • Because that check is so slow (it needs to look at millions of items), we start by only checking the first 20 properties from our list.
  • In the second query, we're going to check the next 20, and so on.

This is the query that tests the first 20 properties:

SELECT DISTINCT ?p ?pLabel ?pAltLabel {
  hint:Query hint:optimizer "None" .
  {
    SELECT ?p ?pLabel ?pAltLabel {
      ?p wikibase:propertyType wikibase:Quantity.
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
    }
    OFFSET 0 LIMIT 20
  }
  ?p wikibase:claim ?p_.
  ?x ?p_ [].
  ?x wdt:P31 wd:Q5.
}

Increase the OFFSET to 20, 40, 60 and so on, up to 500, to test all properties.

Upvotes: 2

Related Questions