Reputation: 45
I am having hard time parsing data from BigQuery in column format for analysis, help is really appreciated.
I have user_detail table with following schema
Field Name | Type | Mode | Description |
---|---|---|---|
user_id | INTEGER | NULLABLE | |
user_names | STRING | NULLABLE | {"dl":"512"} |
I have following sample data in this table
user_id | user_names |
---|---|
12829 | [{"_id":{"$oid":"5d650676af82eb0a30737e74"},"_type":"UserName","capture_date":{"$date":"2019-08-27T00:00:00.000+0000"},"source":"google","name1":"John","name3":"Doe","name4":"D/O Jane Doe","gender":"1","dob":{"$date":"1986-07-10T00:00:00.000+0000"}}] |
I want to get the data in following tabular format
select user_id, source, name1, name3
I have tried unnest operator but apparently because user_names column is in string format, I am not able to get it as it is not in array format (Values referenced in FROM clause must be arrays. user_names has type STRING at [5:10])
Can you please help me how can I accomplish this?
Upvotes: 2
Views: 1295
Reputation: 655
The following query is an example on how to do it:
with cte as (
select 12829 as user_id, '[{"_id":{"$oid":"5d650676af82eb0a30737e74"},"_type":"UserName","capture_date":{"$date":"2019-08-27T00:00:00.000+0000"},"source":"google","name1":"John","name3":"Doe","name4":"D/O Jane Doe","gender":"1","dob":{"$date":"1986-07-10T00:00:00.000+0000"}}]' as json_line
)
select
user_id,
json_value(json_line, '$[0].source') as source,
json_value(json_line, '$[0].name1') as name1,
json_value(json_line, '$[0].name3') as name3
from cte
You have to use $[0].ATTRIBUTE_NAME
because your JSON string is a list of dictionaries so you have to select the element to parse in this case the first one.
If you did not have the square brackets from the list you can just use $.ATTRIBUTE_NAME
.
Upvotes: 1