Reputation: 23
I am trying to come up with a way to locate visits where a user went to a certain hits.page.hostname, but the previous hits.page.hostname is not a certain host. I was looking to use GROUP_CONCAT (like shown at 'Immediate Follow' Page Path in BigQuery) and regex with lookbehind, but I discovered that RE2 doesn't have lookbehind so I'm back to square one.
Has anyone got any ideas?
Upvotes: 1
Views: 184
Reputation: 173046
Below is attempt to "save" your original idea of using GROUP_CONCAT with later REGEX_MATCH
Try below
#legacySQL
SELECT
date,
totals.visits AS totalvisits,
GROUP_CONCAT(hits.page.hostname, ">>") WITHIN RECORD AS sequence
FROM TABLE_DATE_RANGE([MySys:110909631.ga_sessions_],
DATE_ADD(CURRENT_TIMESTAMP(), -90, 'DAY'),
DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'))
HAVING REGEXP_MATCH(CONCAT('>>', sequence, '>>'), '>>(DesiredHost>>)+')
AND NOT REGEXP_MATCH(CONCAT('>>', sequence, '>>'), '>>(RejectHost>>)+(DesiredHost>>)+')
Above looks for visits with DesiredHost
in hits.page.hostname
REGEXP_MATCH(CONCAT('>>', sequence, '>>'), '>>(DesiredHost>>)+')
but excludes those visits where DesiredHost
is preceeded with RejectHost
REGEXP_MATCH(CONCAT('>>', sequence, '>>'), '>>(RejectHost>>)+(DesiredHost>>)+')
As you can see in both expressions above - you can fine tune length of visit within host or set any other more complex logic to look for. Hope you get an idea.
And hope you can get your initial idea back to work!
Upvotes: 0
Reputation: 33755
I think you'd have an easier time using standard SQL. You can look "backwards" through an array by using UNNEST WITH OFFSET
and then the array operator. For example,
#standardSQL
SELECT
date,
totals.visits AS total_visits
FROM
`MyProject.dataset.ga_sessions_*`
WHERE
PARSE_DATE('%Y%m%d', _TABLE_SUFFIX) BETWEEN
DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) AND
DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) AND
EXISTS (
SELECT 1 FROM UNNEST(hits) AS hit WITH OFFSET off
WHERE hit.page.hostname = 'foo' AND
NOT hits[SAFE_OFFSET(off - 1)].page.hostname = 'bar'
);
Upvotes: 1