Luciano Mammino
Luciano Mammino

Reputation: 811

analytics api weird results with sampled data

i'm developing a sidebar to integrate google analytics stats beside my posts details in the administration.

To understand google analytics api I made some test using the data feed query explorer provided by Google itself.

After few tests I noticed a strange behavior that i'll resume with a practical example. I need to know what keywords bring people to a given url and I want to know how many clicks I got from any of these keywords. I also want 3 different data ranges for these informations: daily keywords, monthly keywords and overall keywords...

This is the set of parameters I used:

ids         = <myTableId>
dimensions  = ga:keyword
metrics     = ga:visits
segment     =
filters     = ga:pagePath=~<myUrl>$
sort        =
start-date  =
end-date    =
start-index =
max-results =
max-results = 50

When I try to retrieve data for a single day (eg. start-date = 2011-12-27 and end-date = 2011-12-27) everything seems to work properly.

For example for my url /programmazione/lo-schiaccianoci-in-3d-andrei-konchalovsky-2-dicembre-2011.film I got the following results:

ga:keyword                                  ga:visits
---                                         --- 
(not set)                                   1
lo schiaccianoci film a roma                1
lo schiaccianoci film programmazione roma   1
lo schiaccianoci film roma                  1
lo schiaccianoci programmazione a roma      1
programmazione film lo schiaccianoci a roma 1
schiaccianoci film programmazione           1
schiaccianoci film roma                     1

If i extend my time range I start to have strange behaviours. If I extend the request to all the day of the same month of the previous request i expected to have at least all the keywords retrieved for the single day (maybe with an higher number of clicks), but anyway i got fewer results (and also a warning that says "This result is based on sampled data"):

ga:keyword                      ga:visits 
---                             ---
(not set)                       31
lo schiaccianoci film roma 2011 31

If i try to retrieve allover data (from the first day I created the page to the current day) it gets even worse: I got no data!

So the question is What's wrong with my approach?

--- UPDATE ---

I found this bug report: http://code.google.com/p/analytics-issues/issues/detail?id=160 Do you think it's related?

Upvotes: 1

Views: 823

Answers (1)

bkgraham
bkgraham

Reputation: 120

If your data is getting sampled then it will be of very poor quality. Sampling appears to be based on the total number of visits, so if you reduce your request to a shorter period of time it will generally eliminate sampling. You saw this yourself when you tried requesting only a single day's data.

To fix the problem you must make multiple requests and aggregate it yourself - it makes no sense but that is the only way to fix the problem. We check the sample flag in the result set and reduce the time period and re-request in a loop until we get all clean data.

My own analysis shows that 40% of the time the sampled number of visits is 5% or more off from the non-sampled value. That's for visits. Unique visitors is not calculated in sampling at all (it just returns the visits number), and smaller data points like conversions become even more erratic.

Upvotes: 3

Related Questions