Reputation: 465
I have a BigQuery table with a STRUCT field and I would like to be able to automatically increase its list of elements whenever I attempt to insert a previously unseen element. Is this possible?
-- initially meta only has elements: hair, eyes
CREATE TEMP TABLE tt AS
SELECT
1 AS id,
STRUCT (
'brown' AS hair,
'brown' AS eyes
) AS meta;
-- now I would like to add a neverbefore seen element: weight
INSERT INTO tt
SELECT
2 AS id,
STRUCT (
'brown' AS hair,
160 AS weight
) AS meta;
This obviously does not work and returns the error Query column 2 has type STRUCT<hair STRING, weight INT64> which cannot be inserted into column meta, which has type STRUCT<hair STRING, eyes STRING> at [10:1]
.
The resulting temp table looks like the following after the initial construction:
id | meta.hair | meta.eyes |
---|---|---|
1 | brown | brown |
And then it would ideally automatically add the element "weight" to meta after inserting row 2:
id | meta.hair | meta.eyes | meta.weight |
---|---|---|---|
1 | brown | brown | NULL |
2 | brown | NULL | 160 |
This is probably wishful thinking.
As a real-world example, I know that Stitch's Webhook --> BigQuery integration is somehow achieving this behavior when it syncs data from some of our SaaS products into BigQuery. Stitch handles new, never-seen-before nested fields inside JSON payloads by adding new elements to corresponding STRUCT fields. I am just not sure how this magic is happening.
Upvotes: 1
Views: 435
Reputation: 172993
Assuming your have two sets of data
Consider below approach
create temp function json_extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function json_extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
create temp table temp_table as (
select id, key, value
from (
select id, to_json_string(meta) json from table_1
union all
select id, to_json_string(meta) from table_2
), unnest(json_extract_keys(json)) key with offset
join unnest(json_extract_values(json)) value with offset
using(offset)
);
execute immediate(select '''
select id, struct(''' || string_agg(distinct key, ',') || ''') meta from temp_table
pivot (any_value(value) for key in ("''' || string_agg(distinct key, '","') || '"))'
from temp_table
);
with output
Upvotes: 2
Reputation: 4746
One way of doing this is with key-value fields in an array. Instead of naming the fields directly, you add a name field and a field for each data type you need:
CREATE TEMP TABLE tt AS
SELECT
1 AS id,
[
STRUCT('hair' as name, 'brown' AS str_value, null as int_value),
STRUCT('eyes' as name, 'brown' AS str_value, null as int_value)
] AS meta;
INSERT INTO tt
SELECT
2 AS id,
[
STRUCT('hair' as name, 'brown' AS str_value, null as int_value),
STRUCT('weight' as name, cast(null as string) AS str_value, 160 as int_value)
] AS meta;
select * from tt
Note, that the default data type is int64 (in case you're not explicit using null)
Upvotes: 2