Reputation: 91
I have a table like this:
[ID] [Country_Region] [Date1] [Date2] [Date3] ... [Date150]
I wrote a SQL query in BigQuery that returns a dataset like this:
[ID] [Country_Region] [Date.ColumnName] [Date.Value]
-----------------------------------------------------
1 China _1_22_20 12
2 China _1_23_20 34
[3 China _1_24_20 54] <----- I want this row and all the next ones to appear as well
My SQL query:
WITH timestamped_table AS
(
SELECT
Country_Region,
[STRUCT<timestamp STRING, timestamp_value INT64>
('2020-2-22', _2_22_20),
('2020-2-23', _2_23_20)] AS timestamp_data
FROM
`username.bq_timeseries_covid19.recovered_global`
WHERE
Country_Region = 'China' AND Province_State = 'Zhejiang'
)
SELECT
Country_Region, timestamp
FROM
timestamped_table
CROSS JOIN
UNNEST(timestamped_table.timestamp_data) AS timestamp
Now the only problem I have is that I have to extend these lines from the query:
STRUCT<timestamp STRING, timestamp_value INT64>
('2020-2-22', _2_22_20),
('2020-2-23', _2_23_20)
So that they include all the date columns.
I would imagine it to be something like this if I would be extending it by some script(which I think is not the best idea?):
STRUCT<timestamp STRING, timestamp_value INT64>
for column in get_date_columns():
(timestamp_from_columnname(), column)
I think that writing a SQL script/function would be better but I don't know how to start with this exactly
Upvotes: 1
Views: 66
Reputation: 173121
Below is for BigQuery Standard SQL and DOES NOT require you to list columns explicitly no matter how many columns you have - 50, 50 or 150 - it should work w/o any changes in below code
#standardSQL
SELECT id, Country_Region,
STRUCT(
SPLIT(kv,':')[OFFSET(0)] AS ColumnName,
SPLIT(kv,':')[OFFSET(1)] AS Value
) AS `Date`
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), '[{}"]', ''))) kv
WHERE SPLIT(kv,':')[OFFSET(0)] NOT IN ('id', 'Country_Region')
You can test, play with above using dummy data as in below simplified example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 AS id, 'China' AS Country_Region, 12 AS _1_22_20, 34 AS _1_23_20, 54 AS _1_24_20 UNION ALL
SELECT 2 AS id, 'Italy' AS Country_Region, 22 AS _1_22_20, 44 AS _1_23_20, 64 AS _1_24_20
)
SELECT id, Country_Region,
STRUCT(
SPLIT(kv,':')[OFFSET(0)] AS ColumnName,
SPLIT(kv,':')[OFFSET(1)] AS Value
) AS `Date`
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), '[{}"]', ''))) kv
WHERE SPLIT(kv,':')[OFFSET(0)] NOT IN ('id', 'Country_Region')
with output
Row id Country_Region Date.ColumnName Date.Value
1 1 China _1_22_20 12
2 1 China _1_23_20 34
3 1 China _1_24_20 54
4 2 Italy _1_22_20 22
5 2 Italy _1_23_20 44
6 2 Italy _1_24_20 64
Upvotes: 3
Reputation: 1270713
You need to list the columns explicitly. But you don't need the CTE:
SELECT rg.Country_Region, el.timestamp, el.timestamp_value
FROM `username.bq_timeseries_covid19.recovered_global` rg CROSS JOIN
UNNEST([STRUCT<timestamp STRING, timestamp_value INT64>
('2020-2-22', _2_22_20),
('2020-2-23', _2_23_20)
]) el
WHERE rg.Country_Region = 'China' and
rg.Province_State = 'Zhejiang';
Actually, there are some tricks using JSON and string manipulation, but they are quite hacky and not worth the effort (in my opinion).
Upvotes: 1