Reputation: 49
I am facing an issue related to a project of mine. Here is the summary of what i would like to do :
I have multiple files to process daily (2Go each) with the following extract (no header) :
{"id":"abcdefg","key1":{"key2":{"audiences":{"aud1":1599033396,"aud2":1599030775},"segments":{"seg1":1599033245,"seg2":1599030764,"seg3":1599030764,"seg4":1599030188}}}}
I would like to get this result using bigquery :
id|audiences|segments
abcdefg|aud1|seg1
abcdefg|aud1|seg2
abcdefg|aud1|seg3
abcdefg|aud1|seg4
abcdefg|aud2|seg1
abcdefg|aud2|seg2
abcdefg|aud2|seg3
abcdefg|aud2|seg4
Where the first column contains the value of the key id and the second and third columns contain all the possible combinations between the audiences (keys) and the segments (keys) (in other words, what are the segments contained in each audience).
I would like to do this in order to calculate the number of ids in each segment on each audience.
This is totally feasible with some loops and JSON parsing. However, since i need this to be processed fast, i am thinking to use bigquery in order to achieve this task.
I have a first script that uploads the file in a bq table (every line is a string) and extracts the fields but i do not know how to get the final result.
Is there any way to do this ?
Thanks to all for your help.
Regards
Upvotes: 0
Views: 1030
Reputation: 173171
Below is for BigQuery Standard SQL
#standardSQL
SELECT
JSON_EXTRACT_SCALAR(line, '$.id') id,
TRIM(SPLIT(aud_kv, ':')[OFFSET(0)], '"') audiences,
TRIM(SPLIT(seg_kv, ':')[OFFSET(0)], '"') segments
FROM `project.dataset.table`,
UNNEST(SPLIT(TRIM(JSON_EXTRACT(line, '$.key1.key2.audiences'),'{}'))) aud_kv,
UNNEST(SPLIT(TRIM(JSON_EXTRACT(line, '$.key1.key2.segments'),'{}'))) seg_kv
if to apply to sample data from your question - output is
Row id audiences segments
1 abcdefg aud1 seg1
2 abcdefg aud1 seg2
3 abcdefg aud1 seg3
4 abcdefg aud1 seg4
5 abcdefg aud2 seg1
6 abcdefg aud2 seg2
7 abcdefg aud2 seg3
8 abcdefg aud2 seg4
Upvotes: 2