Reputation: 4733
So the Random Article feature of Wikipedia gives a random article, I can also use RandomInCategory
and specify categories I want, which is what I need.
Now I want to get all the text inside the articles giving some conditions/limitations:
I thought about using a HTML parser in order to do this, maybe work with the Ids/Classes/Headers but I'm not sure this would be 100% accurate.
Can Wikipedia's API do this somehow?
Thank you!
I found this https://en.wikipedia.org/w/api.php?format=json&action=query&generator=random&grnnamespace=0&prop=revisions&rvprop=content&grnlimit=10 in another SO question and its interesting, could a category condition be added here and the ability to also get the languages?
Upvotes: 1
Views: 2886
Reputation: 1723
You can use Petscan to get lists of articles in any particular category (or combinations). I'm not sure of any functionality for checking that other language articles exist automatically, but I presume there are some tools at Wikidata that can help you. You should be able to pass pageids across and get a list of other languages. As for the actual data collection, I'd recommend using the Python library, Beautiful Soup.
Upvotes: 0
Reputation: 28160
You can use Special:RandomInCategory (no API equivalent). Note that it's not really random (not a uniform distribution). Other than that (and namespace) there is no way to add further conditions.
Upvotes: 1