Reputation:
I have two valid queries that I need to combine to produce iOS and Android results in the same table. I tried 'UNION ALL' and this will process 217 GB (shown below). I think I could get by with the data I need if I omit selecting the fields h.eventInfo.eventCategory and h.eventinfo.eventaction, however this only reduces the amount processed to 177 GB. What is the best way to optimize a query like this?
SELECT
date,
'iOS' as app_source,
h.eventInfo.eventCategory,
h.eventinfo.eventaction,
h.eventinfo.eventlabel,
COUNT(1) events,
COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS string))) uniqueEvents
FROM
`xxxx.ga_sessions_*`,
UNNEST(hits) h
WHERE
h.type='EVENT'
and (h.eventInfo.eventCategory = 'Live' and h.eventInfo.eventLabel = 'Team Chat')
or (h.eventInfo.eventCategory = 'Messages' and h.eventInfo.eventLabel = 'Direct Message')
and date >= "20190801"
GROUP BY
1,2,3,4,5
ORDER BY
date
union all
SELECT
date,
'Android' as app_source,
h.eventInfo.eventCategory,
h.eventinfo.eventaction,
h.eventinfo.eventlabel,
COUNT(1) events,
COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS string))) uniqueEvents
FROM
`xxxx.ga_sessions_*`,
UNNEST(hits) h
WHERE
h.type='EVENT'
and (h.eventInfo.eventCategory = 'Live' and h.eventInfo.eventLabel = 'Team Chat')
or (h.eventInfo.eventCategory = 'Messages' and h.eventInfo.eventLabel = 'Direct Message')
and date >= "20190801"
GROUP BY
1,2,3,4,5
ORDER BY
date
Upvotes: 0
Views: 62
Reputation: 3616
Instead of date >= '20190801'
, you should use _TABLE_SUFFIX >= '20190801'
. You have a sharded table and are using the wildcard *
. The _TABLE_SUFFIX
serves to filter your shards. Note, I am assuming date belongs to the session, and not the nested hit.
Querying multiple tables using a wildcard table
Upvotes: 2