felipecgonc
felipecgonc

Reputation: 558

Load JSON file to BigQuery with empty dict as a value

I am uploading a newline-delimited JSON file from GCS to BigQuery. There are some fields in the JSON file which contain dicts for values, and I have no problem getting those values into BigQuery, as the nested fields are broken down into separate columns. So it all works if the following example is a line from the JSON file:

{"dict_field": {"value1": 1, "value2": 2}}

However, if one line from the file has an empty dict as the value for field_dict, like this:

{"dict_field": {}}

I get the following error message:

Exception: BigQuery job failed. Final error was: {'reason': 'invalid', 'message': "Unsupported empty struct type for field 'dict_field'"} [...]

I looked through the BigQuery documentation and couldn't find any stated restriction regarding empty dicts as values. Does anyone know if there is a workaround to this issue, or if I have to manually clean the data before importing it in BigQuery?

Upvotes: 2

Views: 2921

Answers (2)

ryantkelly
ryantkelly

Reputation: 103

Wild that this isn't documented, thanks to this answer I was reassured I wasn't taking crazy pills.

In my situation, I am attempting to load Shopify data into BigQuery from Python. This recursive function helped clear up empty dicts by replacing them with None:

def replace_blank_dict(d):
    if not d:
        return None
    if type(d) is list:
        for list_item in d:
            if type(list_item) is dict:
                for k, v in list_item.items():
                    list_item[k] = replace_blank_dict(v)
    if type(d) is dict:
        for k, v in d.items():
            d[k] = replace_blank_dict(v)
    return d

Called as such, when looping through a list of Orders from the Shopify REST API:

    for order in orders:
        for k, v in order.items():
            order[k] = replace_blank_dict(v)

Upvotes: 2

felipecgonc
felipecgonc

Reputation: 558

As pointed out by @rodvictor and @Fcojavmelo, loading empty dicts from JSON files isn't currently possible although it' not explicitly mentioned anywhere in the BigQuery documentation, only in this issue/feature request.

In conclusion, the data has to be manually cleaned, and any empty dicts removed to avoid errors.

Upvotes: 6

Related Questions