Reputation: 1874
I have been trying to count the session for each page using bigquery where data is exported to bigquery from GA. The schema of the data can be found here.
I have tried following query
SELECT
hits.page.pagePath AS page,
COUNT(totals.visits) AS sessions
FROM
[xxxxxxx.ga_sessions_20160801]
WHERE
REGEXP_MATCH(hits.page.pagePath, r'(orderComplete|checkout)')
AND hits.type = 'PAGE'
GROUP BY
page
ORDER BY
sessions DESC
I compared the result of the query with numbers that I get from the GA but the result is quite different. I expected that above query would give total session for each page but it gives total pageviews
for each page. In other words result of above query exactly match with pageviews
of each page instead of sessions
of each page.
I also tried the following query
SELECT
hits.page.pagePath AS page,
COUNT(hits.isEntrance) AS sessions
FROM
[xxxxxxx.ga_sessions_20160801]
WHERE
REGEXP_MATCH(hits.page.pagePath, r'(orderComplete|checkout)')
AND hits.type = 'PAGE'
GROUP BY
page
ORDER BY
sessions DESC
The result this time is very close to actual but not exactly the same as numbers that I am getting from GA. This time bigquery result is slightly less than that of the GA for some pages.
There is no sampling in GA in my case otherwise result is acceptable because error is between 0.5% to 4%
I am working with raw data without any filter on GA profile and same data is exported to bigquery.
Question: How is session counted when we count session by pages?
When I don't group the result by hits.page.pagePath
there is no mismatch of results that I get from GA and that from bigquery
Upvotes: 0
Views: 1065
Reputation: 245
To count sessions, I use COUNT(visitId)
instead of COUNT(totals.visits)
. This seems to give me numbers identical--or very, very close--to what I see in GA.
Upvotes: 0
Reputation: 33765
Instead of using COUNT(totals.visits)
, what if you use COUNT(1)
? The results of COUNT
will vary depending on whether you are using a repeated field. Possibly relevant question with some in depth answers: BigQuery flattens when using field with same name as repeated field
As an aside, standard SQL (uncheck "Use Legacy SQL" under "Show Options") has less surprising semantics around counting, although it would require you to be more explicit with operations on arrays in this case.
Upvotes: 0