Shamshad Alam
Shamshad Alam

Reputation: 1874

Counting session from each page from the data exported to bigquery from google analytics

I have been trying to count the session for each page using bigquery where data is exported to bigquery from GA. The schema of the data can be found here.

I have tried following query

SELECT
    hits.page.pagePath AS page,
    COUNT(totals.visits) AS sessions
FROM
    [xxxxxxx.ga_sessions_20160801]
WHERE
    REGEXP_MATCH(hits.page.pagePath, r'(orderComplete|checkout)')
    AND hits.type = 'PAGE'
GROUP BY 
    page
ORDER BY
    sessions DESC

I compared the result of the query with numbers that I get from the GA but the result is quite different. I expected that above query would give total session for each page but it gives total pageviews for each page. In other words result of above query exactly match with pageviews of each page instead of sessions of each page.

I also tried the following query

SELECT
    hits.page.pagePath AS page,
    COUNT(hits.isEntrance) AS sessions
FROM
    [xxxxxxx.ga_sessions_20160801]
WHERE
    REGEXP_MATCH(hits.page.pagePath, r'(orderComplete|checkout)')
    AND hits.type = 'PAGE'
GROUP BY 
    page
ORDER BY
    sessions DESC

The result this time is very close to actual but not exactly the same as numbers that I am getting from GA. This time bigquery result is slightly less than that of the GA for some pages.

There is no sampling in GA in my case otherwise result is acceptable because error is between 0.5% to 4%

I am working with raw data without any filter on GA profile and same data is exported to bigquery.

Question: How is session counted when we count session by pages?

When I don't group the result by hits.page.pagePath there is no mismatch of results that I get from GA and that from bigquery

Upvotes: 0

Views: 1065

Answers (2)

conan
conan

Reputation: 245

To count sessions, I use COUNT(visitId) instead of COUNT(totals.visits). This seems to give me numbers identical--or very, very close--to what I see in GA.

Upvotes: 0

Elliott Brossard
Elliott Brossard

Reputation: 33765

Instead of using COUNT(totals.visits), what if you use COUNT(1)? The results of COUNT will vary depending on whether you are using a repeated field. Possibly relevant question with some in depth answers: BigQuery flattens when using field with same name as repeated field

As an aside, standard SQL (uncheck "Use Legacy SQL" under "Show Options") has less surprising semantics around counting, although it would require you to be more explicit with operations on arrays in this case.

Upvotes: 0

Related Questions