Reputation: 43
we're trying to use script for a funnel for google BigQuery that Doug Mackenzie created linked here:http://online-behavior.com/analytics/funnel-analysis
We have run into some problems with our SQL call.
We're trying to use a filter from a hits.customDimensions.index = 20 where the value is 1. These values are always present so a simple filter such as the ones used as examples in his introduction is unfortunately not adequate.
A filter such as:
WHERE hits.customDimensions.index = '20'
Does not filter what we need to have filtered.
Rather we need (we imagine) something like this:
WHERE MAX(IF(hits.customDimensions.index=20,hits.customDimensions.value,NULL)) WITHIN hits as locked= '1'
This is how we've tried doing it:
FROM (
SELECT
fullVisitorId,
visitId,
MIN(hits.hitNumber) AS firstHit,
MAX(IF(hits.isExit, 1, 0)) AS exit
FROM(
SELECT
*,
MAX(IF(hits.customDimensions.index=20,hits.customDimensions.value,NULL)) WITHIN hits as locked,
FROM
TABLE_DATE_RANGE([[dataset.ga_sessions_], TIMESTAMP('2017-06-01'), TIMESTAMP('2017-06-04'))
)
WHERE
REGEXP_MATCH(hits.page.pagePath, '^/plus/.*/[0-9]')
AND totals.visits = 1
AND locked= '1'
GROUP BY
fullVisitorId,
visitId) s0
This returns the error:
Error: Cannot query the cross product of repeated fields customDimensions.index and hits.kunde. ; Cannot query the cross product of repeated fields customDimensions.value and hits.kunde.
We have seen that there are some suggestions to use FLATTEN
on problems with the above mentioned error. However this did seem to give syntax errors:
FROM (
SELECT
fullVisitorId,
visitId,
MIN(hits.hitNumber) AS firstHit,
MAX(IF(hits.isExit, 1, 0)) AS exit,
MAX(IF(hits.customDimensions.index=20,hits.customDimensions.value,NULL)) WITHIN hits AS locked
FROM
FLATTEN(TABLE_DATE_RANGE([[dataset.ga_sessions_], TIMESTAMP('2017-06-01'), TIMESTAMP('2017-06-04')),hits.customDimension)
WHERE
REGEXP_MATCH(hits.page.pagePath, '^/plus/.*/[0-9]')
AND totals.visits = 1
AND locked = '1'
GROUP BY
fullVisitorId,
visitId) s0
Just gives the error: Error: Not found: Table eb-bigquery:3917183.ga_sessions_
So any suggestions to how we can get the filter to work would be much appreciated.
Upvotes: 0
Views: 191
Reputation: 11777
BigQuery has evolved a lot since this blog was written and now we have more advanced techniques to process data.
That's the main reason why I asked if you could use the Standard Version of BigQuery as solving this would be easier there. For instance:
SELECT
fullvisitorid,
visitid,
(select min(hitNumber) from unnest(hits) where regexp_contains(page.pagePath, r'^/plus/.*/[0-9]')) firstHit,
(select max(isExit) from UNNEST(hits) where regexp_contains(page.pagePath, r'^/plus/.*/[0-9]')) exitFlag
FROM `table`
WHERE 1 = 1
AND EXISTS(select 1 from unnest(hits) where regexp_contains(page.pagePath, r'^/plus/.*/[0-9]'))
AND EXISTS(select 1 from unnest(hits) hits where (select count(value) from unnest(hits.customDimensions) custd where index=20) > 0)
AND totals.visits = 1
This query does the same thing and runs faster than the Legacy ones as it avoids first unnesting all repeated fields and then processing the data.
Upvotes: 2