Reputation: 2855
I'm new to BigQuery and SQL. I'm trying to return data from Google's BigQuery. I have a query that can get data from tables of form EITHER events_* OR events_intraday_*, but I'm at an early stage and would really like to pull all the data from both sets of tables with a single query. It seems like this should be trivial but nothing I've tried has worked. I can't find an example in any doc which explains how to combine UNNEST with multiple tables in Google SQL.
Here's my original working query:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
`analytics_xxx.events_intraday_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
Variation 1:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
`analytics_xxx.events_intraday_*`, `analytics_xxx.events_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
Variation 1 error: Column name event_params is ambiguous at [17:15]
Variation 2:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
`analytics_356435236.events_intraday_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param,
`analytics_356435236.events_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
Variation 2 error: Column name event_params is ambiguous at [19:15]
Variation 3:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
(`analytics_xxx.events_intraday_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param),
(`analytics_xxx.events_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param)
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
Variation 3 error: Syntax error: Expected keyword JOIN but got "," at [16:48] SQL
Variation 4:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM
(`analytics_xxx.events_intraday_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param)
JOIN
(`analytics_xxx.events_*`,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param)
WHERE
_TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
AND event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
Variation 4 error: Syntax error: Expected keyword JOIN but got "," at [16:48]
...and so on. I've tried a dozen other variations, but nothing has worked. Many thanks for any help!
Upvotes: 0
Views: 1379
Reputation: 23
you can use with
WITH
events as (SELECT * FROM `analytics_xxx.events_*`
WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'),
events_intraday as (SELECT * FROM `analytics_xxx.events_intraday_*`
WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}')
all_events as (SELECT * from events UNION ALL SELECT * from events_intraday)
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM all_events,
UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param
WHERE
event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
Upvotes: 0
Reputation: 45
@Jaytiger
I'm writing a post because I don't have enough rep to comment. The intraday only gets 'merged' to the event table when the current day is over.
Therefore, if you want current real-time data, you have to use intraday.
@OP. Here's the general idea. You don't need to filter your intra days by month and year. The previous intra days are automatically deleted. But to be safe, use the latest one.
select *
from
(
(select *
from `analytics_fillyourid.events_*`
where _TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
and FORMAT_DATE("%Y%m%d", CURRENT_DATE()))
UNION ALL
(select *
from `analytics_fillyourid.events_intraday_*`
where _TABLE_SUFFIX = FORMAT_DATE("%Y%m%d", CURRENT_DATE()))
)
EDIT 1: (Some SQL History)
In Standard SQL it is impossible to union two tables using one Select statement.
However in Legacy SQL, you can:
SELECT * FROM TABLE A,TABLE B
The same syntax in Standard SQL will perform a cross-join instead. (not what you want)
Upvotes: 0
Reputation: 2855
Just to see what would happen, I asked ChatGPT, which gave me the following code that seems to work perfectly:
SELECT
user_id as user_id,
TIMESTAMP_MICROS(event_timestamp) as timestamp,
experiment_id_param.value.string_value AS experiment_id,
variation_id_param.value.int_value AS variation_id,
geo.country as country,
traffic_source.source as source,
traffic_source.medium as medium,
device.category as device,
device.web_info.browser as browser,
device.operating_system as os
FROM (
SELECT * FROM `analytics_xxx.events_*`
WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
UNION ALL
SELECT * FROM `analytics_xxx.events_intraday_*`
WHERE _TABLE_SUFFIX BETWEEN '{{startYear}}{{startMonth}}{{startDay}}' AND '{{endYear}}{{endMonth}}{{endDay}}'
), UNNEST(event_params) AS experiment_id_param,
UNNEST(event_params) AS variation_id_param
WHERE
event_name = 'experiment_viewed'
AND experiment_id_param.key = 'experiment_id'
AND variation_id_param.key = 'variation_id'
AND user_id is not null
I had previously tried "UNION ALL" but the key change seems to be splitting the "WHERE" into a section for selecting the table by date suffix, and a selection for selecting the record parameters.
Upvotes: 1