Karthik Raman
Karthik Raman

Reputation: 103

How to use Wikipedia API to get page statistics for all pages in a Category?

I am looking to identify the most popular pages in a Wikipedia Category (for example, which graph algorithms had the highest page views in the last year?). However, there seems to be little up-to-date information of Wikipedia APIs, especially for obtaining statistics.

For example, the StackOverflow post on How to use Wikipedia API to get the page view statistics of a particular page in Wikipedia? contains answers that no longer seem to work.

I have dug around a bit, but I am unable to find any usable APIs, other than a really nice website, where I could potentially do this manually, by typing page titles one by one (max. up to ten pages only): https://tools.wmflabs.org/pageviews/. Would appreciate any help. Thanks!

Upvotes: 4

Views: 2214

Answers (2)

Nemo
Nemo

Reputation: 2544

TreeViews is a tool designed to do exactly this. Getting good data is going to be hard if your category contains thousands of pages, in which case you'd better do the calculations yourself as Krenair suggests.

Upvotes: 0

Krenair
Krenair

Reputation: 610

You can use a MediaWiki API call like this to get the titles in the category: https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics Then you can use this to get page view statistics for each page: https://wikimedia.org/api/rest_v1/#!/Pageviews_data/get_metrics_pageviews_per_article_project_access_agent_article_granularity_start_end (careful of the rate limit)

E.g. for the last year, article "Physics" (part of the Physics category): https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/all-agents/Physics/daily/20151104/20161104

If you're dealing with large categories, it may be best to start downloading statistics from https://dumps.wikimedia.org/other/pageviews/2016/2016-11/ to avoid making so many REST API calls.

Upvotes: 2

Related Questions