Reputation: 19
I want to find topics related to a given topic and also the degree of relationship between multiple topics. For this, I tried to extract the Wiki Page of the Topic and build a taxonomy using the Categories of the topic (given at the bottom of the page). I want to use Python API of Wikipedia for this (https://wikipedia.readthedocs.org/en/latest/code.html#api). But when I extract categories, it returns the hidden categories too that are normally not visible on the Wiki Page.
import wikipedia
import requests
import pprint
from bs4 import BeautifulSoup
wikipedia.set_lang("en")
query = raw_input()
WikiPage = wikipedia.page(title = query,auto_suggest = True)
cat = WikiPage.categories
for i in cat:
print i
I know the other option is to use a scraper. But I want to use the API to do this.
Upvotes: 1
Views: 735
Reputation: 8520
You can definitely use the API for this. Just append &clshow=!hidden
to your category query, like this:
http://en.wikipedia.org/w/api.php?action=query&titles=Stack%20Overflow&prop=categories&clshow=!hidden
(I'm assuming English Wikipedia here, but the API is the same everywhere.
Also, just to be clear: There is no such thing as a “Python API” to Wikipedia, just the MediaWiki API, that you can call from any programming language. In your example code you are using a Python library (one of many) to access the Wikipedia API. This library does not seem to have an option for excluding hidden categories. For a list of other, perhaps more flexible, Python libraries, see http://www.mediawiki.org/wiki/API:Client_code#Python. Personally I quite like wikitools for simple tasks like yours. It would then look something like this:
from wikitools.wiki import Wiki
from wikitools.api import APIRequest
site = Wiki("http://fa.wikipedia.org/w/api.php")
site.login("username", "password")
params = {
"action": "query",
"titles": "سرریز_پشته",
"prop": "categories",
"clshow": "!hidden",
}
request = APIRequest(site, params)
result = request.query()
echo result
Upvotes: 1