Reputation: 87
I have a very big table in BigQuery formatted as follows:
row | column.key | column.value.string_value | column.value.int_value | column.value.float_value | column.value.boolean_value | id |
---|---|---|---|---|---|---|
1 | key1 | aa | null | null | null | id1 |
key2 | null | null | null | false | ||
key3 | fa | null | null | null | ||
key4 | null | null | null | true | ||
2 | key1 | ab | null | null | null | id1 |
key2 | null | null | null | false | ||
key3 | gf | null | null | null | ||
key4 | null | null | null | false | ||
3 | key1 | af | null | null | null | id2 |
key2 | null | null | null | true | ||
key3 | fa | null | null | null | ||
key4 | null | null | null | false |
I need to re-format it as follows:
row | key1 | key2 | key3 | key4 | id |
---|---|---|---|---|---|
1 | aa | false | fa | true | id1 |
2 | ab | false | gf | false | id1 |
3 | af | true | fa | false | id2 |
I tried to use this but couldn't make it work in my table
How to unnest and pivot two columns in BigQuery
One important thing in my table is that, like in the above table, the ids repeat in different rows but have different values for the same keys. Where the ids repeat I want to still have all incidences of the ids, like in the formatted table example.
Also I put only 4 keys here but I actually have 50, so the least manual it is, the better. If the only way is writing each column manually then I'll still do it though.
Does anyone know how to do it? I basically need to turn every key into a new column with it's corresponding value associated with the ids, but I don't know how to do it.
Thank you very much!
Upvotes: 0
Views: 1934
Reputation: 12264
In order to re-format it, firstly you need to merge column.value
struct into one single column value. This can be easily done using COALESCE
function since usually there is only one non-null value in column.value
struct in Google Analytics.
COALESCE(
value.string_value,
'' || value.int_value, -- to covert int64 to string
'' || value.float_value,
'' || value.boolean_value
) value,
After that, you can simply reshape it using PIVOT
query.
SELECT * FROM (
SELECT t.* EXCEPT(column),
key,
COALESCE(
value.string_value,
'' || value.int_value,
'' || value.float_value,
'' || value.boolean_value
) value,
FROM sample_table t, UNNEST(column)
) PIVOT (ANY_VALUE(value) FOR key IN ('key1', 'key2', 'key3', 'key4'));
Query results
sample_table
WITH sample_table AS (
SELECT 1 row,
'id1' id,
[STRUCT('key1' AS key, STRUCT('aa' AS string_value, INT64(null) AS int_value, FLOAT64(null) AS float_value, BOOL(null) AS boolean_value) AS value),
STRUCT('key2' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value),
STRUCT('key3' AS key, STRUCT('fa' AS string_value, null AS int_value, null AS float_value, null AS boolean_value) AS value),
STRUCT('key4' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, true AS boolean_value) AS value)] column
UNION ALL
SELECT 2 row, 'id1' id,
[STRUCT('key1' AS key, STRUCT('ab' AS string_value, INT64(null) AS int_value, FLOAT64(null) AS float_value, BOOL(null) AS boolean_value) AS value),
STRUCT('key2' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value),
STRUCT('key3' AS key, STRUCT('gf' AS string_value, null AS int_value, null AS float_value, null AS boolean_value) AS value),
STRUCT('key4' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value)] column
UNION ALL
SELECT 3 row, 'id2' id,
[STRUCT('key1' AS key, STRUCT('af' AS string_value, INT64(null) AS int_value, FLOAT64(null) AS float_value, BOOL(null) AS boolean_value) AS value),
STRUCT('key2' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, true AS boolean_value) AS value),
STRUCT('key3' AS key, STRUCT('fa' AS string_value, null AS int_value, null AS float_value, null AS boolean_value) AS value),
STRUCT('key4' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value)] column
)
You can further generalize it with a dynamic query if you have many keys in your table refering to the link in your question or below.
Upvotes: 1