zadrozny
zadrozny

Reputation: 1731

How to get associated (English) Wikipedia page from Wikidata page / Q number using Wikidata dump?

For @en text alone, a single item from the Wikidata dump contains multiple names:

<http://www.wikidata.org/entity/Q26> <http://www.w3.org/2000/01/rdf-schema#label> "Northern Ireland"@en .
<http://www.wikidata.org/entity/Q26> <http://www.w3.org/2004/02/skos/core#prefLabel> "Northern Ireland"@en .
<http://www.wikidata.org/entity/Q26> <http://schema.org/name> "Northern Ireland"@en .

On the Wikidata page for this article (http://www.wikidata.org/entity/Q26), which of these (if any) corresponds to the canonicalized name used on the associated (English) the Wikipedia page?

Upvotes: 1

Views: 725

Answers (1)

Dan Scott
Dan Scott

Reputation: 86

Grab the triple in which the predicate is schema:partOf and the object is the wikipedia you want (for example, https://en.wikipedia.org/).

Here's an example using Python's rdflib:

>>> import rdflib
>>> g = rdflib.Graph()
>>> r = g.parse("https://www.wikidata.org/entity/Q26.nt")
>>> for s, p, o in g:
...     if p == rdflib.URIRef('http://schema.org/isPartOf') and o == rdflib.URIRef('https://en.wikipedia.org/'):
...             print(s)
... 
https://en.wikipedia.org/wiki/Northern_Ireland

You can adjust this approach according to whatever parser you're using, of course.

Upvotes: 1

Related Questions