user11465943
user11465943

Reputation:

What is the most optimal way (processing wise) to combine two BigQuery queries?

I have two valid queries that I need to combine to produce iOS and Android results in the same table. I tried 'UNION ALL' and this will process 217 GB (shown below). I think I could get by with the data I need if I omit selecting the fields h.eventInfo.eventCategory and h.eventinfo.eventaction, however this only reduces the amount processed to 177 GB. What is the best way to optimize a query like this?

SELECT
  date,
  'iOS' as app_source,
  h.eventInfo.eventCategory,
  h.eventinfo.eventaction,
  h.eventinfo.eventlabel,
  COUNT(1) events,
  COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS string))) uniqueEvents
FROM
   `xxxx.ga_sessions_*`,
  UNNEST(hits) h
WHERE
  h.type='EVENT'
  and (h.eventInfo.eventCategory = 'Live' and h.eventInfo.eventLabel = 'Team Chat')
  or (h.eventInfo.eventCategory = 'Messages' and h.eventInfo.eventLabel = 'Direct Message')
  and date >= "20190801"
GROUP BY
  1,2,3,4,5
  ORDER BY
  date

union all

SELECT
 date,
  'Android' as app_source,
  h.eventInfo.eventCategory,
  h.eventinfo.eventaction,
  h.eventinfo.eventlabel,
  COUNT(1) events,
  COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS string))) uniqueEvents
FROM
   `xxxx.ga_sessions_*`,
  UNNEST(hits) h
WHERE
  h.type='EVENT'
  and (h.eventInfo.eventCategory = 'Live' and h.eventInfo.eventLabel = 'Team Chat')
  or (h.eventInfo.eventCategory = 'Messages' and h.eventInfo.eventLabel = 'Direct Message')
  and date >= "20190801"
GROUP BY
  1,2,3,4,5
  ORDER BY
  date

Upvotes: 0

Views: 62

Answers (1)

rtenha
rtenha

Reputation: 3616

Instead of date >= '20190801', you should use _TABLE_SUFFIX >= '20190801'. You have a sharded table and are using the wildcard *. The _TABLE_SUFFIX serves to filter your shards. Note, I am assuming date belongs to the session, and not the nested hit.

Querying multiple tables using a wildcard table

Upvotes: 2

Related Questions