Mads Sheye
Mads Sheye

Reputation: 31

BigQuery data versus Google Analytics

I cannot seem to the get the same number of sessions and users that I have Google Analytics Premium in BigQuery. I use the fullvisitorID to calculate the number of sessions (count) and users (unique count) should this not correspond to the numbers in Google Analytics?

Upvotes: 3

Views: 1906

Answers (2)

Felipe Hoffa
Felipe Hoffa

Reputation: 59355

Update: BQ now supports the EXACT_COUNT_DISTINCT() function.


As @Pentium10 says, COUNT(DISTINCT x) gives approximate results for big numbers on BigQuery.

2 alternatives for exact results:

COUNT(DISTINCT x, 10000)

Gives an exact count distinct, if said count is less than 10000. Change value to go higher (slower results)

SELECT COUNT(*) FROM (
  SELECT x
  FROM [table]
  GROUP EACH BY x
)

Also counts distinct values in an exact way.

Upvotes: 1

hoggkm
hoggkm

Reputation: 280

While @Felipe Hoffa is correct that COUNT(DISTINCT x, 10000) will get you to a more precise number, it can can be further enhanced by making the 10,000 number into as much as 1,000,000 (Which I believe is the largest "Sampling" rate currently: BigQuery Documentation of Count Distinct). As not a whole lot of people are using the Google Anaytics Premium data within BigQuery, the community for these question is very small. As someone who does use GA data within BigQuery on a Daily Basis, I can tell you that my research & validation has shown that the following metric definitions are about as closely aligned as you can get to what a Google Analytics UnSampled Report will tell you.

Sessions

count(distinct concat(fullvisitorid, string(visitid)), 1000000) as sessions

Users

count(distinct fullvisitorid, 1000000) as users

New Users

count(distinct (case when totals.newvisits <> 0 then concat(fullvisitorid, string(visitid)) end), 1000000) as new_users

Pageviews

sum(case when hits.type = "PAGE" then 1 else 0 end) as pageviews

Unique Pageviews

count(distinct (case when hits.type = "PAGE" then concat(fullvisitorid, string(visitid), hits.page.pagepath) end), 1000000) as unique_pageviews

Bounces

count(distinct (case when totals.bounces <> 0 then concat(fullvisitorid, string(visitid)) end), 1000000) as bounces

Upvotes: 4

Related Questions