Arminmsg
Arminmsg

Reputation: 621

find the subjects that are connecting entites in DBpedia with SPARQL

I'm extracting entities from a text, most of the time I get multiple entites, for example <http://dbpedia.org/resource/NASA>, <http://dbpedia.org/resource/IPhone> and <http://dbpedia.org/resource/Apple_Inc.>

These two entites, don't share the same dct:subject is there a way to query a path to get a list of the subjects connect my entities?

My goal is to create kind of a "page rank", to find the most relevant subjects for a given entity.

Preferably with a counter how many steps are between them.

I've tried to brute force it, start with a entity, get all the subjects and then get all entites for the subject and so on but the queries are starting to get enormous.

Upvotes: 0

Views: 337

Answers (1)

TallTed
TallTed

Reputation: 9434

Springing from @AKSW's comments...

One option, without limit on length of skos:broader path lengths, which will exceed resource consumption limits on the public DBpedia endpoint, but which could be run on a private instance (in the cloud or wherever) where you may relax those limits --

PREFIX   dbr:  <http://dbpedia.org/resource/>
PREFIX   dct:  <http://purl.org/dc/terms/>
PREFIX  skos:  <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?cat 
WHERE
  { <http://dbpedia.org/resource/Apple_Inc.>
        dct:subject/skos:broader*  ?cat . 
    dbr:IPhone 
        dct:subject/skos:broader*  ?cat . }

The succinct option, using Virtuoso-specific syntax (based on an early draft of SPARQL Property Paths) to limit the path's length (here requiring at least 1 skos:broader and permitting up to 3) --

PREFIX   dbr:  <http://dbpedia.org/resource/>
PREFIX   dct:  <http://purl.org/dc/terms/>
PREFIX  skos:  <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?cat 
WHERE
  { ?cat
       ^skos:broader{1,3}/^dct:subject
           <http://dbpedia.org/resource/Apple_Inc.> , 
           dbr:IPhone 
  }

Another succinct option, this time using standard SPARQL Property Paths syntax to limit the path's length --

PREFIX   dbr:  <http://dbpedia.org/resource/>
PREFIX   dct:  <http://purl.org/dc/terms/>
PREFIX  skos:  <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?cat 
WHERE
  { ?cat
       ^skos:broader/^skos:broader?/^skos:broader?/^dct:subject
           <http://dbpedia.org/resource/Apple_Inc.> , 
           dbr:IPhone 
  }

You can also use 2 statements with the uninverted paths in the WHERE clauses, first in Virtuoso-specific form --

  { <http://dbpedia.org/resource/Apple_Inc.> 
       dct:subject/skos:broader{1,3}   ?cat  .
    dbr:IPhone 
       dct:subject/skos:broader{1,3}   ?cat  .
  }

-- and then in standard SPARQL --

  { <http://dbpedia.org/resource/Apple_Inc.> 
       dct:subject/skos:broader/skos:broader?/skos:broader?   ?cat  .
    dbr:IPhone 
       dct:subject/skos:broader/skos:broader?/skos:broader?   ?cat  .
  }

Upvotes: 3

Related Questions