Reputation: 21
I'd like to compare a collection of objects pairwise for a given similarity metric. The metric will be defined explicitly such that some properties much match exactly and some others can only be so different to each other (i.e., comparing floats: no more than a 50% SMAPE between them).
How would I go about constructing such a query? The output would ideally be an Nx2 table, where each row contains two IRIs for the comparable objects. Duplicates (i.e., 1==2 is a match as well as 2==1) are admissible but if we can avoid them that would be great as well.
I would like to run this on all pairs with a single query. I would probably be able to figure out how to do it for a given object, but when querying across all objects simultaneously this problem becomes much more difficult.
Does anyone have insights into how to perform this?
Upvotes: 1
Views: 907
Reputation: 11479
The idea is this:
PREFIX ex: <http://example.org/ex#>
SELECT DISTINCT ?subject1 ?subject2
WHERE {
?subject1 ex:belongs ex:commonCategory .
?subject2 ex:belongs ex:commonCategory .
?subject1 ex:exactProperty ?e .
?subject2 ex:exactProperty ?e .
?subject1 ex:approxProperty ?a1 .
?subject2 ex:approxProperty ?a2 .
FILTER ( ?subject1 > ?subject2 ) .
FILTER ( (abs(?a1-?a2)/(abs(?a1)+abs(?a2))) < 0.5 )
}
E.g., on DBpedia:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX umbel-rc: <http://umbel.org/umbel/rc/>
SELECT DISTINCT ?subject1 ?subject2
WHERE {
?subject1 rdf:type umbel-rc:Actor .
?subject2 rdf:type umbel-rc:Actor .
?subject1 dbo:spouse ?spouse1 .
?subject2 dbo:spouse ?spouse2 .
?subject1 dbo:wikiPageID ?ID1 .
?subject2 dbo:wikiPageID ?ID2 .
FILTER ( ?subject1 > ?subject2 ) .
FILTER ( ?spouse1 = ?spouse2 ) .
FILTER ( abs(?ID1-?ID2)/xsd:float(?ID1+?ID2) < 0.05 )
}
Thus, probably, Zsa Zsa Gabor and Magda Gabor are the same person.
Both were spouses of George Sanders and their wikiPageID
's are not very different from each other.
Some explanations:
?subject1 > ?subject2
clause removes "permutation duplicates";xsd:float
see this question.Upvotes: 1