Reputation: 11
I want to generate an xml/json that displays the category info (count of pages and subcats) for all sub-subcategories in a specific category in Wikipedia. This requires me to parse through 2 levels in the data hierarchy, as in category > list of subcategories > list of sub-subcategories > display number of articles per sub-subcategory.
Using wikipedia API, this gives me what I'm looking for one level of subcategories:
Here's the script I'm running, which gives me dictionaries of sub-sub-categories, but I can't get page counts (categoryinfo) to show. How can I fix this?
from wikitools import wiki, category, api
def get_category_members (category_name, depth, lang='en'):
articles = {}
if depth < 0:
return articles
#Begin crawling articles in category
results = wikipedia_query({'list':'categorymembers',
'cmtitle': category_name,
'cmtype' : 'subcat',
'cmlimit': '300',
'action' : 'query',
'prop' : 'categoryinfo'}, lang)
return results
if 'categorymembers' in results.keys() and len(results['categorymembers']) > 0:
for i, page in enumerate(results['categorymembers']):
article = {page['title'] : 'categoryinfo'}
articles.update(article)
return articles
Upvotes: 1
Views: 1459
Reputation: 2544
If this is for a Wikimedia project like Wikipedia, it's probably easier to rely on Magnus Manske's category recursion tools, like catscan (see code logic.
As far as I can see, your code doesn't create a Wiki object, nor an APIRequest, nor it iterates in subcategories recursively. See an example where I used categorymembers as generator for extracting more information on category members (not recursively in subcategories, though).
Upvotes: 2