Reputation: 31
I cannot seem to the get the same number of sessions and users that I have Google Analytics Premium in BigQuery. I use the fullvisitorID to calculate the number of sessions (count) and users (unique count) should this not correspond to the numbers in Google Analytics?
Upvotes: 3
Views: 1906
Reputation: 59355
Update: BQ now supports the EXACT_COUNT_DISTINCT() function.
As @Pentium10 says, COUNT(DISTINCT x) gives approximate results for big numbers on BigQuery.
2 alternatives for exact results:
COUNT(DISTINCT x, 10000)
Gives an exact count distinct, if said count is less than 10000. Change value to go higher (slower results)
SELECT COUNT(*) FROM (
SELECT x
FROM [table]
GROUP EACH BY x
)
Also counts distinct values in an exact way.
Upvotes: 1
Reputation: 280
While @Felipe Hoffa is correct that COUNT(DISTINCT x, 10000) will get you to a more precise number, it can can be further enhanced by making the 10,000 number into as much as 1,000,000 (Which I believe is the largest "Sampling" rate currently: BigQuery Documentation of Count Distinct). As not a whole lot of people are using the Google Anaytics Premium data within BigQuery, the community for these question is very small. As someone who does use GA data within BigQuery on a Daily Basis, I can tell you that my research & validation has shown that the following metric definitions are about as closely aligned as you can get to what a Google Analytics UnSampled Report will tell you.
Sessions
count(distinct concat(fullvisitorid, string(visitid)), 1000000) as sessions
Users
count(distinct fullvisitorid, 1000000) as users
New Users
count(distinct (case when totals.newvisits <> 0 then concat(fullvisitorid, string(visitid)) end), 1000000) as new_users
Pageviews
sum(case when hits.type = "PAGE" then 1 else 0 end) as pageviews
Unique Pageviews
count(distinct (case when hits.type = "PAGE" then concat(fullvisitorid, string(visitid), hits.page.pagepath) end), 1000000) as unique_pageviews
Bounces
count(distinct (case when totals.bounces <> 0 then concat(fullvisitorid, string(visitid)) end), 1000000) as bounces
Upvotes: 4