Reputation: 2441
As input we have a wikipedia page title
for which we want to extract its wikipedia page ID
. For this purpose i am using the following python code:
#! /usr/bin/python
# -*- coding: utf-8 -*
import requests
if __name__ == "__main__":
url = "https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Daniel cudmore businessman"
result = requests.get(url).json()
print result
I can't find the wikipedia page ids of the following titles:
{u'batchcomplete': u'', u'query': {u'pages': {u'-1': {u'ns': 0, u'missing': u'', u'title': u'Daniel cudmore businessman'}}}}
The actual id of the page should be: 37030093
In here the problem is that the used wikipedia page title is Daniel Cudmore (businessman) where as mine has the following form daniel cudmore businessman.
{u'batchcomplete': u'', u'query': {u'normalized': [{u'to': u'Prince david of georgia', u'from': u'prince david of georgia'}], u'pages': {u'-1': {u'ns': 0, u'missing': u'', u'title': u'Prince david of georgia'}}}}
The actual id of the page should be: 3443932
In here the title of the wikipedia page and the title that I used are the same. I can't find the problem.
On the DBpedia SPARQL endpoint:
SELECT ?id WHERE {
<http://dbpedia.org/resource/Daniel_Cudmore_(businessman)>
<http://dbpedia.org/ontology/wikiPageID> ?id}
Upvotes: 1
Views: 305
Reputation: 8569
In the latter example ("Prince_david_of_georgia"), you've got different character cases (compare with "Prince_David_of_Georgia"), so that particular page doesn't exist either, on Wikipedia
You could use the Special Search
-URL: https://en.wikipedia.org/wiki/Special:Search/Prince_david_of_georgia to get the requested page and then retrieve the ID
or a list of suggestions: https://en.wikipedia.org/wiki/Special:Search/Daniel_Cudmore_businessman Which you can parse for the first entry. This will likely be your page. Do some string comparison without white space, braces etc to double check - then retrieve the ID as you did, already.
Upvotes: 1