Reputation: 53
I have a python dictionary that looks like this:
{
'id': 123,
'categories': [
{'category': 'fruit', 'values': ['apple', 'banana']},
{'category': 'animal', 'values': ['cat']},
{'category': 'plant', 'values': []}
]
}
I am trying to insert those values into a table in big query via the API using python, I just need to format the above into an "INSERT table VALUES" query. The table needs to have the fields: id
, categories.category
, categories.values
.
I need categories to basically be an array with the category and each category's corresponding values. The table is supposed to look sort of like this in the end - except I need it to be just one row per id, with the corresponding category fields nested and having the proper field name:
SELECT 123 as id, (["fruit"], ["apple", "banana"]) as category
UNION ALL (SELECT 123 as id, (["animal"], ["cat"]) as category)
UNION ALL (SELECT 123 as id, (["plant"], ["tree", "bush", "rose"]) as category)
I'm not really sure how to format the "INSERT" query to get the desired result, can anyone help?
Upvotes: 3
Views: 7774
Reputation: 4032
If you want to load a dictionary to BigQuery using Python, you have first to prepare your data. I chose to convert the Python Dictionary to a .json file and then load it to BigQuery using the Python API. However, according to the documentation, BigQuery has some limitations regarding loading .json nested data, among them:
For this reason, some modifications should be done in the file, so you can load the created .json file to BiguQuery. I have created two scripts, in which: the first one converts the Python dict to a JSON file and the second the JSON file is formatted as New Line delimited json and then loaded in BigQuery.
Convert the python dict to a .json file. Notice that you have to wrap the whole data between []:
import json
from google.cloud import bigquery
py_dict =[{
'id': 123,
'categories': [
{'category': 'fruit', 'values': ['apple', 'banana']},
{'category': 'animal', 'values': ['cat']},
{'category': 'plant', 'values': []}
]
}]
json_data = json.dumps(py_dict, sort_keys=True)
out_file = open('json_data.json','w+')
json.dump(py_dict,out_file)
Second, convert json to new line delimited json and load to BigQuery:
import json
from google.cloud import bigquery
with open("json_data.json", "r") as read_file:
data = json.load(read_file)
result = [json.dumps(record) for record in data]
with open('nd-proceesed.json', 'w') as obj:
for i in result:
obj.write(i+'\n')
client = bigquery.Client()
filename = '/path/to/file.csv'
dataset_id = 'sample'
table_id = 'json_mytable'
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
job_config.autodetect = True
with open("nd-proceesed.json", "rb") as source_file:
job = client.load_table_from_file(source_file, table_ref, job_config=job_config)
job.result() # Waits for table load to complete.
print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))
Then, in the BigQuery UI, you can query your table as follows:
SELECT id, categories
FROM `test-proj-261014.sample.json_mytable4` , unnest(categories) as categories
And the output:
Upvotes: 11
Reputation: 172944
You can use below query - with your dictionary text embed into it
#standardSQL
WITH data AS (
SELECT '''
{
'id': 123,
'categories': [
{'category': 'fruit', 'values': ['apple', 'banana']},
{'category': 'animal', 'values': ['cat']},
{'category': 'plant', 'values': ['tree', 'bush', 'rose']}
]
}
''' dict
)
SELECT
JSON_EXTRACT_SCALAR(dict, '$.id') AS id,
ARRAY(
SELECT AS STRUCT
JSON_EXTRACT_SCALAR(cat, '$.category') AS category,
ARRAY(
SELECT TRIM(val, '"')
FROM UNNEST(JSON_EXTRACT_ARRAY(cat, '$.values')) val
)`values`
FROM UNNEST(JSON_EXTRACT_ARRAY(dict, '$.categories')) cat
) AS categories
FROM data
which produces below result
Row id categories.category categories.values
1 123 fruit apple
banana
animal cat
plant tree
bush
rose
Upvotes: 1