user3089952
user3089952

Reputation: 19

Wikipedia Python API - Prevent hidden categories

I want to find topics related to a given topic and also the degree of relationship between multiple topics. For this, I tried to extract the Wiki Page of the Topic and build a taxonomy using the Categories of the topic (given at the bottom of the page). I want to use Python API of Wikipedia for this (https://wikipedia.readthedocs.org/en/latest/code.html#api). But when I extract categories, it returns the hidden categories too that are normally not visible on the Wiki Page.

import wikipedia

import requests
import pprint
from bs4 import BeautifulSoup
wikipedia.set_lang("en")
query = raw_input()
WikiPage = wikipedia.page(title = query,auto_suggest = True)
cat = WikiPage.categories
for i in cat:
    print i

I know the other option is to use a scraper. But I want to use the API to do this.

Upvotes: 1

Views: 735

Answers (1)

leo
leo

Reputation: 8520

You can definitely use the API for this. Just append &clshow=!hidden to your category query, like this:

http://en.wikipedia.org/w/api.php?action=query&titles=Stack%20Overflow&prop=categories&clshow=!hidden

(I'm assuming English Wikipedia here, but the API is the same everywhere.

Also, just to be clear: There is no such thing as a “Python API” to Wikipedia, just the MediaWiki API, that you can call from any programming language. In your example code you are using a Python library (one of many) to access the Wikipedia API. This library does not seem to have an option for excluding hidden categories. For a list of other, perhaps more flexible, Python libraries, see http://www.mediawiki.org/wiki/API:Client_code#Python. Personally I quite like wikitools for simple tasks like yours. It would then look something like this:

from wikitools.wiki import Wiki
from wikitools.api import APIRequest

site = Wiki("http://fa.wikipedia.org/w/api.php") 
site.login("username", "password")

params = {
          "action": "query",
          "titles": "سرریز_پشته",
          "prop": "categories",
          "clshow": "!hidden",
         }

request = APIRequest(site, params)
result = request.query()

echo result

Upvotes: 1

Related Questions