Reputation: 3
I got quite a big set of URLs (> 8.500) I want to query the Google Analytics API with using R. I'm working with the googleAnalyticsR package. The problem is, that I am indeed able to loop through my set of urls, but the dataframe created only returns the total values for the host-id for each row (e.g. same values for each row).
Here's how far I got to this point:
library(googleAnalyticsR)
library(lubridate)
#Authorize with google
ga_auth()
ga.acc.list = ga_account_list()
my.id = 123456
#set time range
soty = floor_date(Sys.Date(), "year")
yesterday = floor_date(Sys.Date(), "day") - days(1)
#get some - in this case - random URLs
urls = c("example.com/de/", "example.com/us/", "example.com/en/")
urls = gsub("^example.com/", "ga:pagePath=~", urls)
df = data.frame()
#get data
for(i in urls){
ga.data = google_analytics_4(my.id,
date_range = c(soty, yesterday),
metrics = c("pageviews","avgTimeOnPage","entrances","bounceRate","exitRate"),
filters = urls[i])
df = rbind(df, ga.data)}
With the result of always receiving the total statistics for the my.id-domain in each row in the dataframe created (own data):
Anyone knows of a better way on how to tackle this or does google analytics simply prevent us from querying it in such a way?
Upvotes: 0
Views: 691
Reputation: 13334
What you're getting is normal: you only queried for metrics
(c("pageviews","avgTimeOnPage","entrances","bounceRate","exitRate")
), so you only get your metrics.
If you want to break down those metrics, you need to use dimensions
:
https://developers.google.com/analytics/devguides/reporting/core/dimsmets
In your case you're interested in the ga:pagePath
dimension, so something like this (untested code):
ga.data = google_analytics_4(my.id,
date_range = c(soty, yesterday),
dimensions=c("pagePath"),
metrics = c("pageviews","avgTimeOnPage","entrances","bounceRate","exitRate"),
filters = urls[i])
I advise you to use the Google Analytics Query Explorer
until you get the desired results, then port it to R.
As for the number of results, you might be limited to 1K by default until you increase max_rows
. There is a hard limit on 10K from the API, which means you then have to use pagination to retrieve more results if needed. I see some examples in the R documentation with max=99999999, I don't know if the R library automatically handles pagination beyond the first 10K or if they are unaware of the hard limit:
batch_gadata <- google_analytics(id = ga_id,
start="2014-08-01", end="2015-08-02",
metrics = c("sessions", "bounceRate"),
dimensions = c("source", "medium",
"landingPagePath",
"hour","minute"),
max=99999999)
Upvotes: 2