user13498912
user13498912

Reputation: 3

BigQuery Google Analytics: number of sessions that visited a set of pages or had specific custom dimension values

Using Google Analytics data in Google BigQuery, I am trying to recreate a GA segment and pull basic metrics like sessions. The segment definition is: custom dimension A = 1 or 5 or custom dimension B had values or Page = pageA or Page = pageB

The following code works well with the two custom dimension conditions. Once I added the page portion, the result was always all sessions for the day.

SELECT 

  COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING))) AS Sessions
    FROM
    `123456789.ga_sessions_20200201` as Data,
    unnest(hits) as hits

  WHERE totals.visits = 1
-- custom dimension A
  and (
       (SELECT
          value
        FROM
          UNNEST(hits.customDimensions)
        WHERE
          index = 9
        GROUP BY
          1) is not null
-- custom dimension B
    or (SELECT
          value
        FROM
          UNNEST(hits.customDimensions)
        WHERE
          index = 10
        GROUP BY
          1) in ('1','5') 
-- Page
  or Exists(Select
            hits.page.pagePath AS Page
         FROM
           `123456789.ga_sessions_20200201` ,
            unnest(hits) as hits
         Where totals.visits = 1
              AND hits.type = 'PAGE'
              and hits.page.pagePath in ('pageA','pageB'))
       )

Upvotes: 0

Views: 1609

Answers (1)

Sabri Karagönen
Sabri Karagönen

Reputation: 2365

Your mistake is, you're using ga_sessions table in the third condition. Therefore, it scans the all table and exists condition returns True. So, it becomes true for all the rows.

Also, you don't have to join unnest(hits). It creates multiple rows for each session. If you handle it without joining nested hits, you'll have a single row for each session. So, it'll be easier to count them.

I updated it also, simplified the query, so I think this will give you what you want in your dataset.

SELECT 
    COUNT(*) AS Sessions
FROM
    `123456789.ga_sessions_20200201` as Data

WHERE totals.visits = 1
    and exists (
        SELECT
          1
        FROM
          UNNEST(hits) as hit
        WHERE
          EXISTS (select 1 from unnest(hit.customDimensions) where index = 9 and value is not null) -- custom dimension A
          OR EXISTS (select 1 from unnest(hit.customDimensions) where index = 10 and value in ('1', '5')) -- custom dimension B
          OR (hit.type = 'PAGE' and hit.page.PagePath in ('pageA', 'pageB')) -- Page
    )

Upvotes: 1

Related Questions