Reputation: 251
The stats.grok.se tool provides the pageview statistics of a particular page in wikipedia. Is there a method to use the wikipedia api to get the same information? What does the page views counter property actually mean?
Upvotes: 22
Views: 17189
Reputation: 354
em.. this question was asked 6 years ago. There's no such an API in official site in the past.
It changed.
A simple example:
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=pageviews&titles=Buckingham+Palace%7CBank+of+England%7CBritish+Museum
See document:
Shows per-page pageview data (the number of daily pageviews for each of the last pvipdays days). The result format is page title (with underscores) => date (Ymd) => count.
Upvotes: 2
Reputation: 94794
No, there is not.
The counter
property returned from prop=info
would tell you how many times the page was viewed from the server. It is disabled on Wikipedia and other Wikimedia wikis because the aggressive squid/varnish caching means only a tiny fraction of page views would make it to the actual server in order to affect that counter, and even then the increased database write load for updating that counter would probably be prohibitive.
The stats.grok.se tool uses anonymized logs from the cache servers to calculate page views; the raw log files are available from http://dammit.lt/wikistats. If you need an API to access the data from stats.grok.se, you should contact the operator of stats.grok.se to request one be created.
Note this was written 4 years ago, and an API has since been created (see this answer). There's not yet a way to access that via api.php, though.
Upvotes: 7
Reputation: 28160
The Pageview API was released a few days ago: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/{project}/{access}/{agent}/{article}/{granularity}/{start}/{end}
For example https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Foo/daily/20151010/20151012 will give you
{
"items": [
{
"project": "en.wikipedia",
"article": "Foo",
"granularity": "daily",
"timestamp": "2015101000",
"access": "all-access",
"agent": "all-agents",
"views": 79
},
{
"project": "en.wikipedia",
"article": "Foo",
"granularity": "daily",
"timestamp": "2015101100",
"access": "all-access",
"agent": "all-agents",
"views": 81
}
]
}
Upvotes: 26
Reputation: 2454
get the daily JSON for the last 30 days like this
http://stats.grok.se/json/en/latest30/Britney_Spears
Upvotes: 3
Reputation: 155
There doesn't seem to be any API; however, you can make HTTP requests to stats.grok.se and parse the HTML or JSON result to extract the page view counts.
I created a website http://wikipediaviews.org that does exactly that in order to facilitate easier comparison for multiple pages across multiple months and years. To speed things up, and minimize the number of requests to stats.grok.se, I keep all past query results stored locally.
The code I used is available at http://github.com/vipulnaik/wikipediaviews.
The file with the actual retrieval code is in https://github.com/vipulnaik/wikipediaviews/blob/master/backend/pageviewqueries.inc
function getpageviewsonline($page, $month, $language)
{
$url = getpageviewsurl($page,$month,$language);
$html = file_get_contents($url);
preg_match('/(?<=\bhas been viewed)\s+\K[^\s]+/',$html,$numberofpageviews);
return $numberofpageviews[0];
}
The code for getpageviewsurl is in https://github.com/vipulnaik/wikipediaviews/blob/master/backend/stringfunctions.inc:
function getpageviewsurl($page,$month,$language)
{
$page = str_replace(" ","_",$page);
$page = str_replace("'","%27",$page);
return "http://stats.grok.se/" . $language . "/" . $month . "/" . $page;
}
PS: In case the link to wikipediaviews.org doesn't work, it's because I registered the domain quite recently. Try http://wikipediaviews.subwiki.org instead in the interim.
Upvotes: 1