Mapping DBpedia types to Wikipedia Categories

Question

I am trying to map DBPedia types to Wikipedia Categories, a simple example would be the following SPARQL query

select distinct ?cat where {
?s a dbpedia-owl:LacrossePlayer;  dcterms:subject ?cat . filter(regex(?cat,'players','i') )
}  limit 100

SPARQL Result
But this is highly inefficient as it has to first map the DBpedia types to DBpedia Named Entities(resources) and then extract their corresponding Wikipedia categories. I am trying to do this mapping for a lot of other DBpedia types.
Is there a direct or more efficient way to do this?

Joshua Taylor · Accepted Answer

Improving the filter may help…

As an initial note, you may get some speedup if you remove or improve your filter. You can, of course, just remove it, but you could also make it more efficienct, since you're not really using any special regular expressions. Just do

filter contains(lcase(str(?cat)),'players')

to check whether the URI for ?cat contains the string players. It might even be better (I'm not sure) to grab the English rdfs:label of ?cat and check that, since you wouldn't have to do the case or string conversions.

… but there are lots of results.

But this is highly inefficient as it has to first map the DBpedia types to DBpedia Named Entities(resources) and then extract their corresponding Wikipedia categories. I am trying to do this mapping for a lot of other DBpedia types. Is there a direct or more efficient way to do this?

I'm not sure exactly what's inefficient in this. The only way that DBpedia types and categories are associated is that resources have types (via rdf:type) and have categories (via dcterms:subject). If you want to find the connections, then you'll need to find the instances of the type and the categories to which they belong. There may be some possibility that you can look into whether any particular infoboxes provide categories to articles and are used in the infobox mapping to provide DBpedia types. That's the only way to get category/DBpedia-types directly, without going through instances that I can think of, and I don't know whether the current dataset has that kind of information.

In general, since Wikipedia categories are not a type hierarchy, there will be lots of categories with which instances of any particular type are associated. For instance, we can count the number of categories associated with the types Fish and LacrossePlayer with a query like this:

select ?type (count(distinct ?category) as ?nCategories) where {
  values ?type { dbpedia-owl:Fish dbpedia-owl:LacrossePlayer }
  ?type ^a/dcterms:subject ?category 
}
group by ?type

SPARQL results

type                                        nCategories
http://dbpedia.org/ontology/LacrossePlayer  346
http://dbpedia.org/ontology/Fish            2375

That query responds pretty quickly, and you can even get those categories pretty easily, too:

select distinct ?type ?category where {
  values ?type { dbpedia-owl:Fish dbpedia-owl:LacrossePlayer }
  ?type ^a/dcterms:subject ?category 
}
order by ?type
limit 4000

SPARQL results

When you start using types that have many more instances, though, these counts get big, and the queries take a while to return. E.g., a very common type like Place:

select ?type (count(distinct ?category) as ?nCategories) where {
  values ?type { dbpedia-owl:Place }
  ?type ^a/dcterms:subject ?category 
}
group by ?type

type                               nCategories
http://dbpedia.org/ontology/Place  191172

I wouldn't suggest trying to pull all that data down from the remote server. If you want to extract it, you should load the data locally.

Mapping DBpedia types to Wikipedia Categories

Answers (1)

Improving the filter may help…

… but there are lots of results.

Related Questions