Reputation: 245
I've been working on a (Standard) SQL query of GA data to find the number of sessions that include either of two URLs. However, filtering by the two URLs and pulling from hits.page.pagePath is where I run into problems.
This is one example of a query that gives me errors:
SELECT
date, COUNT(visitId) AS visits
FROM `123456789.ga_sessions_20170219`,`123456789.ga_sessions_20170220`
WHERE
REGEXP_CONTAINS(hits.page.pagePath, r"^/(apples|oranges).html.*")
GROUP BY date
And straight up flattening doesn't work either:
SELECT
date, COUNT(visitId) AS visits
FROM `123456789.ga_sessions_20170219`,`123456789.ga_sessions_20170220`
WHERE
REGEXP_CONTAINS((
SELECT pagePath
FROM UNNEST(
SELECT page
FROM UNNEST(hits) AS page) AS pagePath), r"^/(apples|oranges).html.*")
GROUP BY date
Also, I'm not sure if this approach will give me the data I want.
Upvotes: 1
Views: 2166
Reputation: 14014
First, you cannot use comma to UNION
tables in standard SQL (comma means CROSS JOIN
) - you can use _TABLE_SUFFIX
or just straight UNION ALL
instead.
And to count sessions which had hit on pages defined by regular expression, you can use the following EXISTS
clause:
SELECT
date, COUNT(visitId) AS visits
FROM
(SELECT * FROM `123456789.ga_sessions_20170219` UNION ALL
SELECT * FROM `123456789.ga_sessions_20170220`)
WHERE
EXISTS(
SELECT 1 FROM UNNEST(hits) h
WHERE REGEXP_CONTAINS(h.page.pagePath, r"^/(apples|oranges).html.*"))
GROUP BY date
Upvotes: 3