Reputation: 4608
I am currently using pywikibot
to obtain the categories of a given wikipedia page (e.g., support-vector machine
) as follows.
import pywikibot as pw
print([i.title() for i in list(pw.Page(pw.Site('en'), 'support-vector machine').categories())])
The results I get is:
[
'Category:All articles with specifically marked weasel-worded phrases',
'Category:All articles with unsourced statements',
'Category:Articles with specifically marked weasel-worded phrases from May 2018',
'Category:Articles with unsourced statements from June 2013',
'Category:Articles with unsourced statements from March 2017',
'Category:Articles with unsourced statements from March 2018',
'Category:CS1 maint: Uses editors parameter',
'Category:Classification algorithms',
'Category:Statistical classification',
'Category:Support vector machines',
'Category:Wikipedia articles needing clarification from November 2017',
'Category:Wikipedia articles with BNF identifiers',
'Category:Wikipedia articles with GND identifiers',
'Category:Wikipedia articles with LCCN identifiers'
]
As you can see the results I am getting include lot of tracking and maintenance categories of wikipedia such as;
However, the categories I am only interested are;
I am wondering if there is a way to get all tracing or maintenance
wikipedia categories, so that I can remove them from the results to get only the informative categories.
Or, please suggest me if there are any other ways of eliminating them from the results.
I am happy to provide more details if needed.
Upvotes: 3
Views: 1053
Reputation: 9086
pywikibot
currently does not provide some of the API features for filtering hidden categories. You can do that manually by searching for the hidden
key in categoryinfo
:
import pywikibot as pw
site = pw.Site('en', 'wikipedia')
print([
cat.title()
for cat in pw.Page(site, 'support-vector machine').categories()
if 'hidden' not in cat.categoryinfo
])
gives:
['Category:Classification algorithms',
'Category:Statistical classification',
'Category:Support vector machines']
See https://www.mediawiki.org/wiki/Help:Categories#Hidden_categories and https://en.wikipedia.org/wiki/Wikipedia:Categorization#Hiding_categories for more info.
Upvotes: 3