Reputation: 303
The following SPARQL query is giving duplicates in Virtuoso even when the DISTINCT clause is used. You can test the query in the DBpedia public endpoint. Which is the problem with the query?
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbpedia:<http://dbpedia.org/resource/>
PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
PREFIX dbpprop: <http://dbpedia.org/property/>
PREFIX vrank:<http://purl.org/voc/vrank#>
SELECT DISTINCT ?person1 ?person1_id ?person2 ?person2_id ?person2_rank
FROM <http://dbpedia.org>
FROM <http://people.aifb.kit.edu/ath/#DBpedia_PageRank>
WHERE {
?person1 rdf:type dbpedia-owl:Person.
?person2 rdf:type dbpedia-owl:Person.
?person1 ?link ?person2.
?person1 dbpedia-owl:wikiPageID ?person1_id.
?person2 dbpedia-owl:wikiPageID ?person2_id.
?person2 vrank:hasRank/vrank:rankValue ?person2_rank.
FILTER (?person1_id != ?person2_id).
FILTER (?person1_id = 308)
} ORDER BY DESC(?person2_rank) ASC(?person2_id)
The results include rows that appear to be duplicates, e.g.:
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Democritus 8211 27.281
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Democritus 8211 27.281
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Heraclitus 13792 26.6914
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Heraclitus 13792 26.6914
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Parmenides 23575 19.6082
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Parmenides 23575 19.6082
Upvotes: 2
Views: 408
Reputation: 567
Although this won't help with your dbpedia query, anyone who arrived here via a search on the title who has control of the model and data may want to know that:
virtuoso double does not seem to suffer from this SELECT DISTINCT problem that occurs with float
Upvotes: 0
Reputation: 85813
I can confirm that it appears that there are duplicates in the results. I'm not absolutely sure what the issue with the duplicates is, but I wonder if it might have something do with the inexact equality for floating point numbers. If, instead of selecting the floating point numbers directly, you select their lexical forms with (note the (str(...) as ?rank) at the end):
SELECT DISTINCT
?person1 ?person1_id
?person2 ?person2_id
(str(?person2_rank) as ?rank)
I get none of the duplicates. This might be worth reporting to the Virtuoso folks as a bug. For what it's worth, if you want floating point values for rank, you can use xsd:float as a function to turn that string back into a floating point value, and when I do that, with the select like the following, I still get the expected distinct results.
SELECT DISTINCT
?person1 ?person1_id
?person2 ?person2_id
(xsd:float(str(?person2_rank)) as ?rank)
Upvotes: 6