Soroush Sotoudeh
Soroush Sotoudeh

Reputation: 198

Handling NaN values while inserting Pandas dataframes into BigQuery tables

I'm using the following code to insert a Pandas dataframe with multiple NaN values into a BigQuery table. The dataframe is prepared in cloud Datalab.

import google.datalab.bigquery as bq

bqtable = ('project_name', 'dataset_name', 'table_name')
table = bq.Table(bqtable)

table_schema = bq.Schema.from_data(df)
table.create(schema = table_schema, overwrite = True)

table.insert(df)

I'm getting the following error because of the NaN values in the dataframe:

RequestException: HTTP request failed: Invalid JSON payload received. 
Unexpected token. : "user_id": NaN,
                               ^

I know that JSON does not understand NaN but I can't just use fillna to convert those NaN values to something else as I need to have those fields inserted as null in the BigQuery table. Does anyone have a workaround for this?

Upvotes: 10

Views: 8409

Answers (2)

user1157751
user1157751

Reputation: 2457

If you meant NULL column like this:

enter image description here

Can you try changing the column type to FLOAT if possible?

enter image description here

Although this does add a tailing .0 to your user_id, but queries shouldn't be affected by it, unless your user_id is set as a string type.

Upvotes: 0

Peter Leimbigler
Peter Leimbigler

Reputation: 11105

Replace all np.nan values with python's None value, then re-run your code (or try df.to_gbq):

df = df.where(pd.notnull(df), None)

I'm not experienced with Google BigQuery and I see nothing wrong with your existing code, but it may be worth installing the pandas-gbq package. Then try to write the DataFrame to GBQ with df.to_gbq, as detailed in the docs here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_gbq.html

Upvotes: 4

Related Questions