Reputation: 254
I would like to use only medical data from Wikipedia for analysis. I use python for scraping. I have used this library for searching by word in query:
import wikipedia
import requests
import pprint
from bs4 import BeautifulSoup
wikipedia.set_lang("en")
query = raw_input()
WikiPage = wikipedia.page(title = query,auto_suggest = True)
cat = WikiPage.categories
for i in cat:
print i
and get the categories.
But, my problem is vice versa:
I want to give category, for example: health or medical terminology and get all articles with this type.
how can I do this?
Upvotes: 6
Views: 5038
Reputation: 8880
There is API:Categorymembers, which documents usage, parameters and gives examples on "how to retrieve lists of pages in a given category, ordered by title". It won't save you from having to descend through the category tree (cf. below) yourself, but you get a nice entry point and machine-readable results.
A very brief pointer is given on the Help:Category page, section Searching for articles in categories:
In addition to browsing through hierarchies of categories, it is possible to use the search tool to find specific articles in specific categories. To search for articles in a specific category, type incategory:"CategoryName" in the search box.
An "OR" can be added to join the contents of one category with the contents of another. For example, enter
incategory:"Suspension bridges" OR incategory:"Bridges in New York City"
to return all pages that belong to either (or both) of the categories, as here.
Note that using search to find categories will not find articles which have been categorized using templates. This feature also doesn't return pages in subcategories.
To address the subcategory problem, the page Special:CategoryTree can be used instead. However, the page does not point to an obvious documentation. So I think the <form>
fields must be manually searched for in the page source to create a programmatic API.
Upvotes: 1