Hanan Mahmoud
Hanan Mahmoud

Reputation: 65

How to filter DBpedia results in SPARQL

I have a little problem... if I have this simple SPARQL query

SELECT ?abstract 
WHERE {
<http://dbpedia.org/resource/Mitsubishi> <http://dbpedia.org/ontology/abstract> ?abstract.
FILTER langMatches( lang(?abstract), 'en')}

I have this result: SPARQL Result and it has a non-English character... is there any idea how to remove them and retrieve just English words?

Upvotes: 0

Views: 341

Answers (1)

Joshua Taylor
Joshua Taylor

Reputation: 85843

You'll need to define exactly what characters you want and don't want in your result, but you can use replace to replace characters outside of a range with, e.g., empty strings. If you wanted to exclude all but the Basic Latin, Latin-1 Supplement, Latin Extended-A, and Latin Extended-B ranges, (which ends up being \u0000–\u024f) you could do the following:

SELECT ?abstract ?cleanAbstract
WHERE {
  dbpedia:Mitsubishi dbpedia-owl:abstract ?abstract 
  FILTER langMatches( lang(?abstract), 'en')
  bind(replace(?abstract,"[^\\x{0000}-\\x{024f}]","") as ?cleanAbstract)
}

SPARQL results

Or even simpler:

SELECT (replace(?abstract_,"[^\\x{0000}-\\x{024f}]","") as ?abstract)
WHERE {
  dbpedia:Mitsubishi dbpedia-owl:abstract ?abstract_
  FILTER langMatches(lang(?abstract_), 'en')
}

SPARQL results

The Mitsubishi Group (, Mitsubishi Gurūpu) (also known as the Mitsubishi Group of Companies or Mitsubishi Companies) is a group of autonomous Japanese multinational companies covering a range of businesses which share the Mitsubishi brand, trademark, and legacy.The Mitsubishi group of companies form a loose entity, the Mitsubishi Keiretsu, which is often referenced in Japanese and US media and official reports; in general these companies all descend from the zaibatsu of the same name. The top 25 companies are also members of the Mitsubishi Kin'yōkai, or "Friday Club", and meet monthly. In addition the Mitsubishi.com Committee exists to facilitate communication and access of the Mitsubishi brand through a portal web site.

You may find the Latin script in Unicode Wikipedia article useful.

Upvotes: 3

Related Questions