Reputation: 3379
I would like to count the number of pageviews by page in BigQuery, using Google Analytics data source tables. I only want to count pages that have the custom page content grouping of ProductList_UA
or ProductDetails_UA
and I want to trim all the parameters from the end of the page URL so that I return a more manageable list of pages.
So far, my query looks as below, but my count of pageviews, bounces and exits are far too high (about 8x) - where am I going wrong?
SELECT IFNULL(REGEXP_EXTRACT(hits.page.pagePath,r'^(.*?)\?'), hits.page.pagePath) AS Trimmed_Page, COUNT(hits.page.pagepath) AS Pageviews, SUM(totals.bounces) AS Bounces, SUM(IF(hits.isexit = TRUE, 1,0)) AS Exits, SUM(IF(hits.isentrance = TRUE, 1,0)) AS Entrances, MIN(hits.contentGroup.contentGroup3) AS Content_Group
FROM `xxx.ga_sessions_20*` AS m
CROSS JOIN UNNEST(m.customdimensions) AS customDimension
CROSS JOIN UNNEST(m.hits) AS hits
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 1 day) and
DATE_sub(current_date(), interval 1 day)
AND (hits.contentGroup.contentGroup3 = 'ProductList_UA' OR hits.contentGroup.contentGroup3 = 'ProductDetails_UA')
AND hits.type="PAGE"
AND hits.isInteraction = TRUE
GROUP BY Trimmed_Page
ORDER BY Pageviews DESC
LIMIT 1000
Upvotes: 0
Views: 595
Reputation: 1269
I suspect the cross join with customDimensions is the cause of you seeing more results than expected, as each row of hits will be multiplied by the number of customDimensions in that row. Experiment without that cross join to see if it solves the issue.
Upvotes: 1