Nachtgold
Nachtgold

Reputation: 548

Apache Jena querying: Too complex or not too complex

I have the following data in my database:

@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix p: <http://www.wikidata.org/prop/> .
@prefix ps: <http://www.wikidata.org/prop/statement/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix wd: <http://www.wikidata.org/entity/> .
@prefix wds: <http://www.wikidata.org/entity/statement/> .
@prefix wdt: <http://www.wikidata.org/prop/direct/> .

wd:Q4472 rdfs:label
        "Aulesti"@en ;
    p:P625 wds:q4472-589D5F6B-FC8C-4584-9546-89A2930F141A ;
    wdt:P625 "Point(-2.5628139 43.296423)"^^geo:wktLiteral .

wds:q4472-589D5F6B-FC8C-4584-9546-89A2930F141A ps:P625 "Point(-2.5628139 43.296423)"^^geo:wktLiteral .

As you can see, the location information exists twice. It's a sample from Wikidata, where that happens quite often. But sometimes only one of them exists, and it is not known which one.

So normally I would collect them with the following query:

PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX p: <http://www.wikidata.org/prop/> 
PREFIX ps: <http://www.wikidata.org/prop/statement/> 

SELECT ?place ?coord WHERE { 
  ?place wdt:P625 ?coord
  OPTIONAL {
    ?place p:P625 ?itemLocation .
    ?itemLocation ps:P625 ?coord .
  }
}

As a human, I would take the shortest path and when there is no ?coord, I have to go the long route down over ?itemLocation.

Which strategy is used by Jena?

Is it a suitable way to get the same result value from two different relations?

Should I query twice?

Do you have another option?

Upvotes: 1

Views: 145

Answers (1)

Median Hilal
Median Hilal

Reputation: 1531

It depends on what data you want. I think that if you want the most relevant value(s) you should use only ?place wdt:P625 ?coord. See the explanation in the following.

The prefix wdt stands for the direct relationship PREFIX wdt: <http://www.wikidata.org/prop/direct/>; this is related to the way the Wikidata represents knowledge. Wikidata is not originally in RDF format, hence, some modeling approach should be adapted to represent these data in RDF.

The method used is in some sense, similar to RDF reification where you might need to add (meta)information to statements (triples) of the form s p o, for example, the source, the author, the time of the statement and so on. Wikidata needs a way to establish an ordering between the multiple values of multi-valued properties. For example, the population of the US might have been issued in 2008 to 290 Millions for example and 2016 to 310 Millions. Thus, a population property might have two different values. Wikidata gives a ranking, that prefers the 2016 value (310) to the 2008 (290) value when querying for the direct values. When you use wdt prefix with property name, you query for direct values, which have the highest ranking between other (you should read this). Otherwise (p prefix), you match against the customized Wikidata model to represent data in RDF, see for example this query.

When you query for a property using wdt prefix, you get two benefits; first you skip the exhaustive syntax to access properties as subjects, and you get for each property the best ranking value. And this is also performance-wise decision, I think.

Upvotes: 2

Related Questions