Haytam
Haytam

Reputation: 4733

Get a random wikipedia article with some conditions/limitations using the API

So the Random Article feature of Wikipedia gives a random article, I can also use RandomInCategory and specify categories I want, which is what I need.

Now I want to get all the text inside the articles giving some conditions/limitations:

I thought about using a HTML parser in order to do this, maybe work with the Ids/Classes/Headers but I'm not sure this would be 100% accurate.
Can Wikipedia's API do this somehow?

Thank you!


I found this https://en.wikipedia.org/w/api.php?format=json&action=query&generator=random&grnnamespace=0&prop=revisions&rvprop=content&grnlimit=10 in another SO question and its interesting, could a category condition be added here and the ability to also get the languages?

Upvotes: 1

Views: 2886

Answers (2)

smartse
smartse

Reputation: 1723

You can use Petscan to get lists of articles in any particular category (or combinations). I'm not sure of any functionality for checking that other language articles exist automatically, but I presume there are some tools at Wikidata that can help you. You should be able to pass pageids across and get a list of other languages. As for the actual data collection, I'd recommend using the Python library, Beautiful Soup.

Upvotes: 0

Tgr
Tgr

Reputation: 28160

You can use Special:RandomInCategory (no API equivalent). Note that it's not really random (not a uniform distribution). Other than that (and namespace) there is no way to add further conditions.

Upvotes: 1

Related Questions