Reputation: 558
I am uploading a newline-delimited JSON file from GCS to BigQuery. There are some fields in the JSON file which contain dicts for values, and I have no problem getting those values into BigQuery, as the nested fields are broken down into separate columns. So it all works if the following example is a line from the JSON file:
{"dict_field": {"value1": 1, "value2": 2}}
However, if one line from the file has an empty dict as the value for field_dict, like this:
{"dict_field": {}}
I get the following error message:
Exception: BigQuery job failed. Final error was: {'reason': 'invalid', 'message': "Unsupported empty struct type for field 'dict_field'"} [...]
I looked through the BigQuery documentation and couldn't find any stated restriction regarding empty dicts as values. Does anyone know if there is a workaround to this issue, or if I have to manually clean the data before importing it in BigQuery?
Upvotes: 2
Views: 2921
Reputation: 103
Wild that this isn't documented, thanks to this answer I was reassured I wasn't taking crazy pills.
In my situation, I am attempting to load Shopify data into BigQuery from Python. This recursive function helped clear up empty dicts by replacing them with None
:
def replace_blank_dict(d):
if not d:
return None
if type(d) is list:
for list_item in d:
if type(list_item) is dict:
for k, v in list_item.items():
list_item[k] = replace_blank_dict(v)
if type(d) is dict:
for k, v in d.items():
d[k] = replace_blank_dict(v)
return d
Called as such, when looping through a list of Orders from the Shopify REST API:
for order in orders:
for k, v in order.items():
order[k] = replace_blank_dict(v)
Upvotes: 2
Reputation: 558
As pointed out by @rodvictor and @Fcojavmelo, loading empty dicts from JSON files isn't currently possible although it' not explicitly mentioned anywhere in the BigQuery documentation, only in this issue/feature request.
In conclusion, the data has to be manually cleaned, and any empty dicts removed to avoid errors.
Upvotes: 6