Lukas
Lukas

Reputation: 21

Pairwise comparison with SPARQL

I'd like to compare a collection of objects pairwise for a given similarity metric. The metric will be defined explicitly such that some properties much match exactly and some others can only be so different to each other (i.e., comparing floats: no more than a 50% SMAPE between them).

How would I go about constructing such a query? The output would ideally be an Nx2 table, where each row contains two IRIs for the comparable objects. Duplicates (i.e., 1==2 is a match as well as 2==1) are admissible but if we can avoid them that would be great as well.

I would like to run this on all pairs with a single query. I would probably be able to figure out how to do it for a given object, but when querying across all objects simultaneously this problem becomes much more difficult.

Does anyone have insights into how to perform this?

Upvotes: 1

Views: 907

Answers (1)

Stanislav Kralin
Stanislav Kralin

Reputation: 11479

The idea is this:

PREFIX ex: <http://example.org/ex#>

SELECT DISTINCT ?subject1 ?subject2
WHERE {
     ?subject1 ex:belongs ex:commonCategory .
     ?subject2 ex:belongs ex:commonCategory .
     ?subject1 ex:exactProperty ?e .
     ?subject2 ex:exactProperty ?e .
     ?subject1 ex:approxProperty ?a1 .
     ?subject2 ex:approxProperty ?a2 .
     FILTER ( ?subject1 > ?subject2 ) .
     FILTER ( (abs(?a1-?a2)/(abs(?a1)+abs(?a2))) < 0.5 )
}

E.g., on DBpedia:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX umbel-rc: <http://umbel.org/umbel/rc/>

SELECT DISTINCT ?subject1 ?subject2
WHERE {
    ?subject1  rdf:type        umbel-rc:Actor .
    ?subject2  rdf:type        umbel-rc:Actor .
    ?subject1  dbo:spouse      ?spouse1 .
    ?subject2  dbo:spouse      ?spouse2 .
    ?subject1  dbo:wikiPageID  ?ID1 .
    ?subject2  dbo:wikiPageID  ?ID2 .
    FILTER    ( ?subject1 > ?subject2 ) .
    FILTER    ( ?spouse1  = ?spouse2 ) .
    FILTER    ( abs(?ID1-?ID2)/xsd:float(?ID1+?ID2) < 0.05 )
}

Thus, probably, Zsa Zsa Gabor and Magda Gabor are the same person.
Both were spouses of George Sanders and their wikiPageID's are not very different from each other.

Some explanations:

  • The ?subject1 > ?subject2 clause removes "permutation duplicates";
  • On the usage of xsd:float see this question.

Upvotes: 1

Related Questions