conan
conan

Reputation: 245

Google BigQuery : number of sessions that visited a set of pages

I've been working on a (Standard) SQL query of GA data to find the number of sessions that include either of two URLs. However, filtering by the two URLs and pulling from hits.page.pagePath is where I run into problems.

This is one example of a query that gives me errors:

SELECT
  date, COUNT(visitId) AS visits
FROM `123456789.ga_sessions_20170219`,`123456789.ga_sessions_20170220`
WHERE
  REGEXP_CONTAINS(hits.page.pagePath, r"^/(apples|oranges).html.*")
GROUP BY date

And straight up flattening doesn't work either:

SELECT
  date, COUNT(visitId) AS visits
FROM `123456789.ga_sessions_20170219`,`123456789.ga_sessions_20170220`
WHERE
  REGEXP_CONTAINS((
    SELECT pagePath
    FROM UNNEST(
      SELECT page
      FROM UNNEST(hits) AS page) AS pagePath), r"^/(apples|oranges).html.*")
GROUP BY date

Also, I'm not sure if this approach will give me the data I want.

Upvotes: 1

Views: 2166

Answers (1)

Mosha Pasumansky
Mosha Pasumansky

Reputation: 14014

First, you cannot use comma to UNION tables in standard SQL (comma means CROSS JOIN) - you can use _TABLE_SUFFIX or just straight UNION ALL instead. And to count sessions which had hit on pages defined by regular expression, you can use the following EXISTS clause:

SELECT
  date, COUNT(visitId) AS visits
FROM 
  (SELECT * FROM `123456789.ga_sessions_20170219` UNION ALL
   SELECT * FROM `123456789.ga_sessions_20170220`)
WHERE
  EXISTS(
    SELECT 1 FROM UNNEST(hits) h
    WHERE REGEXP_CONTAINS(h.page.pagePath, r"^/(apples|oranges).html.*"))
GROUP BY date

Upvotes: 3

Related Questions