dmartines
dmartines

Reputation: 61

How can I filter disambiguated names in dbpedia

I am trying to use the following SPARQL query to find the Rhône department in France.

SELECT * WHERE {
  ?x rdfs:label "Rhone"@en.
  ?x rdf:type dbpedia-owl:Place.
  ?x dbpedia-owl:abstract ?abstract.
  FILTER (LANGMATCHES(LANG(?abstract), 'en'))
}

Here is the DBPEDIA resource: http://dbpedia.org/page/Rh%C3%B4ne_(department)

The problem is that it keeps finding the river Rhone, and not the department (or state). If I change the rdf:type to PopulatedPlace or Settlement then the search doesn't find anything.

Is it possible for me to use '*' or '?' characters to broaden the search string, such as:

'* Rhone *'

as in:

SELECT * WHERE {
  ?x rdfs:label "*Rhone*"@en.
  ?x rdf:type dbpedia-owl:Settlement.
  ?x dbpedia-owl:abstract ?abstract.
  FILTER (LANGMATCHES(LANG(?abstract), 'en'))
}

Second, I presume it is ok to use multiple filters in one query, such as:

SELECT * WHERE {
  ?x rdfs:label "Rhone"@en.
  ?x rdf:type dbpedia-owl:Place.
  ?x rdf:type dbpedia-owl:PopulatedPlace.
  ?x rdf:type dbpedia-owl:Settlement.
  ?x dbpedia-owl:abstract ?abstract.
  FILTER (LANGMATCHES(LANG(?abstract), 'en'))
}

Upvotes: 1

Views: 575

Answers (1)

Ben Companjen
Ben Companjen

Reputation: 1443

Yes, you can create a search string with wildcards - sort of. You can FILTER strings to match a regular expression with FILTER (regex(?label, "Rhone")) which matches strings with the string "Rhone" anywhere in the contents. This FILTER can make the query execution slower.

However, this won't return the resource you want, as the department's label is "Rhône".

The wildcard character for any character is . (the period). So if you're unsure of the spelling of "Rhône" and want it to appear anywhere in the label, you can use

SELECT * WHERE {
  ?x rdf:type dbpedia-owl:Place.
  ?x dbpedia-owl:abstract ?abstract.
  ?x rdfs:label ?label.
  FILTER (regex(?label, "Rh.ne") && LANGMATCHES(LANG(?abstract), 'en'))
}

It does take a very long time to complete this query, because of the regular expression. I just tried this query, and it timed out.

The third query is valid, but will only match resources that are dbpedia-owl:Places and dbpedia-owl:PopulatedPlaces and dbpedia-owl:Settlements (at the same time). Rhône is all three.
If you want dbpedia-owl:Place or dbpedia-owl:PopulatedPlace or dbpedia-owl:Settlement, use:

SELECT DISTINCT * WHERE {
  {
  ?x rdf:type dbpedia-owl:Place.
  ?x rdfs:label ?label.
  ?x dbpedia-owl:abstract ?abstract.
  FILTER (regex(?label, "Rhône") && LANGMATCHES(LANG(?abstract), 'en'))
  } UNION {
  ?x rdf:type dbpedia-owl:Settlement.
  ?x rdfs:label ?label.
  ?x dbpedia-owl:abstract ?abstract.
  FILTER (regex(?label, "Rhône") && LANGMATCHES(LANG(?abstract), 'en'))
  } UNION {
  ?x rdf:type dbpedia-owl:PopulatedPlace.
  ?x rdfs:label ?label.
  ?x dbpedia-owl:abstract ?abstract.
  FILTER (regex(?label, "Rhône") && LANGMATCHES(LANG(?abstract), 'en'))
  }
}

Upvotes: 4

Related Questions