Justin
Justin

Reputation: 154

DBpedia SPARQL filter does not apply to all results

A FILTER NOT EXISTS allows some results through when combined with OPTIONAL triples.

My query:

SELECT DISTINCT * WHERE 
{
  {
    ?en rdfs:label "N'Djamena"@en .
    BIND("N'Djamena" AS ?name) .
  }
  UNION {
    ?en rdfs:label "Port Vila"@en .
    BIND("Port Vila" AS ?name) .
  }
  UNION {
    ?en rdfs:label "Atafu"@en .
    BIND("Atafu" AS ?name) .
  }
  FILTER NOT EXISTS { ?en rdf:type skos:Concept } .
  OPTIONAL { ?en owl:sameAs ?es . FILTER regex(?es, "es.dbpedia") .  }
  OPTIONAL { ?en owl:sameAs ?pt . FILTER regex(?pt, "pt.dbpedia") .  }
} 
LIMIT 100

This query gets the three places as expected, but it also pulls back "Category:Atafu", which should be filtered out by virtue of having "rdf:type skos:Concept".

When used without the OPTIONAL lines, I get the three places expected. When used with those clauses non-optionally, I get only two of the countries, because Atafu doesn't have a page in Portuguese.

I can also move the FILTER NOT EXISTS statement into each of the UNION'd country blocks, but that seems to hurt the server's response time.

Why does the FILTER NOT EXISTS clause filter out "Category:N'Djamena" and Category:Port_Vila but not "Category:Atafu" when followed by OPTIONAL?

Upvotes: 3

Views: 452

Answers (1)

Joshua Taylor
Joshua Taylor

Reputation: 85813

I really have no idea why your query doesn't work. I'd have to chalk it up to some weird Virtuoso thing. There's definitely something strange going on. For instance, if you remove the bind for the last name, you'll get the resources you're expecting:

SELECT DISTINCT * WHERE 
{
  {
    ?en rdfs:label "N'Djamena"@en .
    BIND("N'Djamena" AS ?name) .
  }
  UNION {
    ?en rdfs:label "Port Vila"@en .
    BIND("Port Vila" AS ?name) .
  } 
  UNION {
    ?en rdfs:label "Atafu"@en .
  }
  FILTER NOT EXISTS { ?en rdf:type skos:Concept }
  OPTIONAL { ?en owl:sameAs ?es . FILTER regex(?es, "es.dbpedia") }
  OPTIONAL { ?en owl:sameAs ?pt . FILTER regex(?pt, "pt.dbpedia") .  }
} 
LIMIT 100

SPARQL results

It's really pretty weird. Here's a modified version of your query that gets the results you're looking for. It uses values instead of union, which makes the query simpler. It should be logically equivalent, though, so I'm not sure why it makes a difference.

select distinct * where {
  values ?label { "N'Djamena"@en "Port Vila"@en "Atafu"@en }
  ?en rdfs:label ?label .
  optional { ?en owl:sameAs ?pt . filter regex(?pt, "pt.dbpedia") }
  optional { ?en owl:sameAs ?es . filter regex(?es, "es.dbpedia") }
  filter not exists { ?en a skos:Concept }
  bind(str(?label) as ?name)
}

SPARQL results

I'd actually clean up the string matching though, since regular expressions are probably more power than you need here. You just want to check whether the value starts with a given substring:

select ?en ?label (str(?label) as ?name) ?es ?pt where {
  values ?label { "N'Djamena"@en "Port Vila"@en "Atafu"@en }
  ?en rdfs:label ?label .
  optional { ?en owl:sameAs ?pt . filter strstarts(str(?pt), "http://pt.dbpedia") }
  optional { ?en owl:sameAs ?es . filter strstarts(str(?es), "http://es.dbpedia") }
  filter not exists { ?en a skos:Concept }
}

SPARQL results

Upvotes: 3

Related Questions