Batu Sayıcı
Batu Sayıcı

Reputation: 11

How to get count of articles in sub-sub-categories with MediaWiki API

I want to generate an xml/json that displays the category info (count of pages and subcats) for all sub-subcategories in a specific category in Wikipedia. This requires me to parse through 2 levels in the data hierarchy, as in category > list of subcategories > list of sub-subcategories > display number of articles per sub-subcategory.

Using wikipedia API, this gives me what I'm looking for one level of subcategories:

http://en.wikipedia.org/w/api.php?action=query&format=json&generator=categorymembers&gcmtitle=Category:People_by_nationality_and_occupation&gcmlimit=30&gcmprop=ids|title&prop=categoryinfo&continue=

Here's the script I'm running, which gives me dictionaries of sub-sub-categories, but I can't get page counts (categoryinfo) to show. How can I fix this?

from wikitools import wiki, category, api

def get_category_members (category_name, depth, lang='en'):

articles = {}
if depth < 0:
    return articles

#Begin crawling articles in category
results = wikipedia_query({'list':'categorymembers',
                               'cmtitle': category_name,
                               'cmtype' : 'subcat',
                               'cmlimit': '300',
                               'action' : 'query',
                               'prop' : 'categoryinfo'}, lang)
return results
if 'categorymembers' in results.keys() and len(results['categorymembers']) > 0:
    for i, page in enumerate(results['categorymembers']):
        article = {page['title'] : 'categoryinfo'} 
        articles.update(article)
    return articles

Upvotes: 1

Views: 1459

Answers (1)

Nemo
Nemo

Reputation: 2544

If this is for a Wikimedia project like Wikipedia, it's probably easier to rely on Magnus Manske's category recursion tools, like catscan (see code logic.

As far as I can see, your code doesn't create a Wiki object, nor an APIRequest, nor it iterates in subcategories recursively. See an example where I used categorymembers as generator for extracting more information on category members (not recursively in subcategories, though).

Upvotes: 2

Related Questions