Reputation: 1476
which of these is more efficient query to run:
one where the INCLUDE / DON'T INCLUDE filter condition in WHERE clause and tested for each row
SELECT distinct fullvisitorid
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t, unnest(hits) as ht
WHERE (select max(if(cd.index = 1,cd.value,null))from unnest(ht.customDimensions) cd)
= 'high_worth'
one returning all rows and then outer SELECT clause doing all filtering test to INCLUDE / DON'T INCLUDE
SELECT distinct fullvisitorid
FROM
(
SELECT
fullvisitorid
, (select max(if(cd.index = 1,cd.value,null)) FROM unnest(ht.customDimensions) cd) hit_cd_1
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t
, unnest(hits) as ht
)
WHERE
hit_cd_1 = 'high_worth'
Both produce exactly same results! the goal is: list of fullvisitorId, who ever sent hit Level Custom Dimension (index =1) with value = 'high_worth' users ()
Thanks for your inputs!
Cheers!
/Vibhor
Upvotes: 0
Views: 71
Reputation: 172993
As of your original two queries: obviously - they are identical even though you slightly rearranged appearance. so from those two you should choose whatever easier for you to read/maintain. I would pick first query - but it is really matter of personal preferences
Meantime, try below (BigQuery Standard SQL) - it looks slightly optimized to me - but I didn't have chance to test on real data
SELECT DISTINCT fullvisitorid
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t,
UNNEST(hits) AS ht, UNNEST(ht.customDimensions) cd
WHERE cd.index = 1 AND cd.value = 'high_worth'
Obviously - it should produce same result as your two queries
Execution plan looks better to me and it (query) is faster is much easier to read / manage
Upvotes: 0
Reputation: 2519
I tried the two queries and compared their explanations, they are identical. I am assuming some sort of optimization magic occurs prior to the query being ran.
Upvotes: 1