Little Brain
Little Brain

Reputation: 2855

How to SELECT from events_* AND events_intraday_* tables using SQL in Google BigQuery

I'm new to BigQuery and SQL. I'm trying to return data from Google's BigQuery. I have a query that can get data from tables of form EITHER events_* OR events_intraday_*, but I'm at an early stage and would really like to pull all the data from both sets of tables with a single query. It seems like this should be trivial but nothing I've tried has worked. I can't find an example in any doc which explains how to combine UNNEST with multiple tables in Google SQL.

Here's my original working query:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  `analytics_xxx.events_intraday_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

Variation 1:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  `analytics_xxx.events_intraday_*`, `analytics_xxx.events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

Variation 1 error: Column name event_params is ambiguous at [17:15]

Variation 2:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  `analytics_356435236.events_intraday_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param,
  `analytics_356435236.events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

Variation 2 error: Column name event_params is ambiguous at [19:15]

Variation 3:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  (`analytics_xxx.events_intraday_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param),
  (`analytics_xxx.events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param)
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

Variation 3 error: Syntax error: Expected keyword JOIN but got "," at [16:48] SQL

Variation 4:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  (`analytics_xxx.events_intraday_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param)
JOIN
  (`analytics_xxx.events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param)
WHERE
  _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

Variation 4 error: Syntax error: Expected keyword JOIN but got "," at [16:48]

...and so on. I've tried a dozen other variations, but nothing has worked. Many thanks for any help!

Upvotes: 0

Views: 1379

Answers (3)

ibraheem qanah
ibraheem qanah

Reputation: 23

you can use with

WITH 
events as (SELECT * FROM `analytics_xxx.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'),
events_intraday as (SELECT * FROM `analytics_xxx.events_intraday_*`
  WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}')
all_events as (SELECT * from events UNION ALL SELECT * from events_intraday)
SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM all_events,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param
WHERE
  event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

Upvotes: 0

Chung David
Chung David

Reputation: 45

@Jaytiger

I'm writing a post because I don't have enough rep to comment. The intraday only gets 'merged' to the event table when the current day is over.

Therefore, if you want current real-time data, you have to use intraday.

@OP. Here's the general idea. You don't need to filter your intra days by month and year. The previous intra days are automatically deleted. But to be safe, use the latest one.

select *
from
(
    (select * 
    from `analytics_fillyourid.events_*` 
    where _TABLE_SUFFIX BETWEEN 
        FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) 
        and FORMAT_DATE("%Y%m%d", CURRENT_DATE()))
    
    UNION ALL

    (select * 
    from `analytics_fillyourid.events_intraday_*`
    where _TABLE_SUFFIX = FORMAT_DATE("%Y%m%d", CURRENT_DATE()))
)

EDIT 1: (Some SQL History)

In Standard SQL it is impossible to union two tables using one Select statement.

However in Legacy SQL, you can:

SELECT * FROM TABLE A,TABLE B 

The same syntax in Standard SQL will perform a cross-join instead. (not what you want)

Upvotes: 0

Little Brain
Little Brain

Reputation: 2855

Just to see what would happen, I asked ChatGPT, which gave me the following code that seems to work perfectly:

SELECT
  user_id as user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM (
  SELECT * FROM `analytics_xxx.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
  
  UNION ALL
  
  SELECT * FROM `analytics_xxx.events_intraday_*`
  WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
), UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param
WHERE
  event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_id is not null

I had previously tried "UNION ALL" but the key change seems to be splitting the "WHERE" into a section for selecting the table by date suffix, and a selection for selecting the record parameters.

Upvotes: 1

Related Questions