BigQuery get columns from JSON file keys

Question

I am facing an issue related to a project of mine. Here is the summary of what i would like to do :

I have multiple files to process daily (2Go each) with the following extract (no header) :

{"id":"abcdefg","key1":{"key2":{"audiences":{"aud1":1599033396,"aud2":1599030775},"segments":{"seg1":1599033245,"seg2":1599030764,"seg3":1599030764,"seg4":1599030188}}}}

I would like to get this result using bigquery :

id|audiences|segments
abcdefg|aud1|seg1
abcdefg|aud1|seg2
abcdefg|aud1|seg3
abcdefg|aud1|seg4
abcdefg|aud2|seg1
abcdefg|aud2|seg2
abcdefg|aud2|seg3
abcdefg|aud2|seg4

Where the first column contains the value of the key id and the second and third columns contain all the possible combinations between the audiences (keys) and the segments (keys) (in other words, what are the segments contained in each audience).

I would like to do this in order to calculate the number of ids in each segment on each audience.

This is totally feasible with some loops and JSON parsing. However, since i need this to be processed fast, i am thinking to use bigquery in order to achieve this task.

I have a first script that uploads the file in a bq table (every line is a string) and extracts the fields but i do not know how to get the final result.

Is there any way to do this ?

Thanks to all for your help.

Regards

Mikhail Berlyant · Accepted Answer

Below is for BigQuery Standard SQL

#standardSQL 
SELECT 
  JSON_EXTRACT_SCALAR(line, '$.id') id,
  TRIM(SPLIT(aud_kv, ':')[OFFSET(0)], '"') audiences,
  TRIM(SPLIT(seg_kv, ':')[OFFSET(0)], '"') segments
FROM `project.dataset.table`,
UNNEST(SPLIT(TRIM(JSON_EXTRACT(line, '$.key1.key2.audiences'),'{}'))) aud_kv,
UNNEST(SPLIT(TRIM(JSON_EXTRACT(line, '$.key1.key2.segments'),'{}'))) seg_kv

if to apply to sample data from your question - output is

Row id      audiences   segments     
1   abcdefg aud1        seg1     
2   abcdefg aud1        seg2     
3   abcdefg aud1        seg3     
4   abcdefg aud1        seg4     
5   abcdefg aud2        seg1     
6   abcdefg aud2        seg2     
7   abcdefg aud2        seg3     
8   abcdefg aud2        seg4

BigQuery get columns from JSON file keys

Answers (1)

Related Questions