How to search for rdfs:labels in dbpedia which are partial matches to a given term using SPARQL?

I am using this query to search for all labels that contains the word "Medi"

select distinct ?label where 
{ 
    ?concept rdfs:label  ?label 
    filter contains(?label,"Medi") 
    filter(langMatches(lang(?label),"en")) 
}

However, as soon as I change the term from "Medi" to "Medicare" it doesn't work and times out. How do I get it to work with longer words like Medicare i.e. extract all labels which has the word Medicare in it.

Upvotes: 1

Views: 345

Answers (1)

UninformedUser
UninformedUser

Reputation: 8465

Your query has to iterate over all labels in DBpedia - which is quite a large number - and then apply String containment check. This is indeed expensive.

Even a count query leads to an "estimated timeout error":

select count(?label) where 
{ 
    ?concept rdfs:label  ?label 
    filter(regex(str(?label),"Medi")) 
    filter(langMatches(lang(?label),"en")) 
}

Two options:

  1. Virtuoso has some fulltext search support:

    SELECT DISTINCT ?label WHERE { 
      ?concept rdfs:label ?label .
      ?label bif:contains "Medicare"
      FILTER(langMatches(lang(?label),"en"))
    }
    
  2. Since the public DBpedia endpoint is a shared endpoint, the solution is to load the DBpedia dataset into your own triple store, e.g. Virtuoso. There you can adjust the max. estimated execution timeout parameter.

Upvotes: 2

Related Questions