Reputation: 3663
I'd like to get the number of Wikipedia pages matching a condition. e.g.
Among many other ways, I can do this by indexing Wikipedia with Lucene, but that's pretty time consuming.
Is there a way to perform this type of query on the Media Wiki API?
What is the query limit on the Wikipedia API?
Cheers, Mulone
Upvotes: 3
Views: 482
Reputation: 50378
Try the list=search
query. For example:
(Since you said you're only interested in the number of matching pages, I included srlimit=1
and srprop=
in the query to minimize the extra information returned. Apparently there's no way to keep the API from returning at least the title of the first match, though; srlimit=0
just gives an error message.)
As for query limits, there are limits on the number of results per query, but I don't think MediaWiki enforces any hard limits on the rate at which you query the API. MediaWiki does limit editing rates, but I don't think any such limits are currently applied for searching.
I believe the recommendation is that you run your queries serially — that is, wait for the previous query to finish before sending the next one. This provides a sort of automatic rate limiting, since if the servers are busy, your queries will take longer to complete. If you want to play nice, you could also include a maxlag
parameter in your queries (preferably with exponential backoff if it fails); the maxlag mechanism is really designed more for automatic edits than for searching, but it does at least ensure that your code will not hit Wikimedia's server at times when they're particularly overloaded.
Also, if you want to do a lot of these kinds of queries, you might want to consider downloading a Wikipedia database dump and either indexing it yourself (as you mentioned in your question) or just reading it in a single pass and counting matching pages as you encounter them.
Upvotes: 1