jordipala
jordipala

Reputation: 303

SPARQL DISTINCT gives duplicates in Virtuoso

The following SPARQL query is giving duplicates in Virtuoso even when the DISTINCT clause is used. You can test the query in the DBpedia public endpoint. Which is the problem with the query?

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbpedia:<http://dbpedia.org/resource/>
PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
PREFIX dbpprop: <http://dbpedia.org/property/>
PREFIX vrank:<http://purl.org/voc/vrank#>
SELECT DISTINCT ?person1 ?person1_id ?person2 ?person2_id ?person2_rank
FROM <http://dbpedia.org> 
FROM <http://people.aifb.kit.edu/ath/#DBpedia_PageRank> 
WHERE {
    ?person1 rdf:type dbpedia-owl:Person.
    ?person2 rdf:type dbpedia-owl:Person.
    ?person1 ?link ?person2.
    ?person1 dbpedia-owl:wikiPageID ?person1_id.
    ?person2 dbpedia-owl:wikiPageID ?person2_id.
    ?person2 vrank:hasRank/vrank:rankValue ?person2_rank.
    FILTER (?person1_id != ?person2_id).
    FILTER (?person1_id = 308)
} ORDER BY DESC(?person2_rank) ASC(?person2_id)

SPARQL results

The results include rows that appear to be duplicates, e.g.:

http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Democritus  8211 27.281
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Democritus  8211 27.281
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Heraclitus 13792 26.6914
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Heraclitus 13792 26.6914
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Parmenides 23575 19.6082
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Parmenides 23575 19.6082

Upvotes: 2

Views: 408

Answers (2)

Paul Cuddihy
Paul Cuddihy

Reputation: 567

Although this won't help with your dbpedia query, anyone who arrived here via a search on the title who has control of the model and data may want to know that:

virtuoso double does not seem to suffer from this SELECT DISTINCT problem that occurs with float

Upvotes: 0

Joshua Taylor
Joshua Taylor

Reputation: 85813

I can confirm that it appears that there are duplicates in the results. I'm not absolutely sure what the issue with the duplicates is, but I wonder if it might have something do with the inexact equality for floating point numbers. If, instead of selecting the floating point numbers directly, you select their lexical forms with (note the (str(...) as ?rank) at the end):

SELECT DISTINCT
  ?person1 ?person1_id
  ?person2 ?person2_id
  (str(?person2_rank) as ?rank)

I get none of the duplicates. This might be worth reporting to the Virtuoso folks as a bug. For what it's worth, if you want floating point values for rank, you can use xsd:float as a function to turn that string back into a floating point value, and when I do that, with the select like the following, I still get the expected distinct results.

SELECT DISTINCT
  ?person1 ?person1_id
  ?person2 ?person2_id
  (xsd:float(str(?person2_rank)) as ?rank)

SPARQL results

Upvotes: 6

Related Questions