Vibhor Jain
Vibhor Jain

Reputation: 1476

Recommendation on Query Efficiency : 2 different versions

which of these is more efficient query to run:

one where the INCLUDE / DON'T INCLUDE filter condition in WHERE clause and tested for each row

SELECT distinct fullvisitorid
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t, unnest(hits) as ht
WHERE (select max(if(cd.index = 1,cd.value,null))from unnest(ht.customDimensions) cd) 
       = 'high_worth'

one returning all rows and then outer SELECT clause doing all filtering test to INCLUDE / DON'T INCLUDE

SELECT distinct fullvisitorid
FROM 
    (
     SELECT 
        fullvisitorid
        , (select max(if(cd.index = 1,cd.value,null)) FROM unnest(ht.customDimensions) cd) hit_cd_1
     FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t
          , unnest(hits) as ht
     )
WHERE
   hit_cd_1 = 'high_worth'

Both produce exactly same results! the goal is: list of fullvisitorId, who ever sent hit Level Custom Dimension (index =1) with value = 'high_worth' users ()

Thanks for your inputs!

Cheers!

/Vibhor

Upvotes: 0

Views: 71

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

As of your original two queries: obviously - they are identical even though you slightly rearranged appearance. so from those two you should choose whatever easier for you to read/maintain. I would pick first query - but it is really matter of personal preferences

Meantime, try below (BigQuery Standard SQL) - it looks slightly optimized to me - but I didn't have chance to test on real data

SELECT DISTINCT fullvisitorid
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t, 
UNNEST(hits) AS ht, UNNEST(ht.customDimensions) cd
WHERE cd.index = 1 AND cd.value = 'high_worth'  

Obviously - it should produce same result as your two queries
Execution plan looks better to me and it (query) is faster is much easier to read / manage

Upvotes: 0

Ying Li
Ying Li

Reputation: 2519

I tried the two queries and compared their explanations, they are identical. I am assuming some sort of optimization magic occurs prior to the query being ran.

Upvotes: 1

Related Questions