Reputation: 198
I'm using the following code to insert a Pandas dataframe with multiple NaN
values into a BigQuery table. The dataframe is prepared in cloud Datalab.
import google.datalab.bigquery as bq
bqtable = ('project_name', 'dataset_name', 'table_name')
table = bq.Table(bqtable)
table_schema = bq.Schema.from_data(df)
table.create(schema = table_schema, overwrite = True)
table.insert(df)
I'm getting the following error because of the NaN
values in the dataframe:
RequestException: HTTP request failed: Invalid JSON payload received.
Unexpected token. : "user_id": NaN,
^
I know that JSON
does not understand NaN
but I can't just use fillna
to convert those NaN
values to something else as I need to have those fields inserted as null
in the BigQuery table.
Does anyone have a workaround for this?
Upvotes: 10
Views: 8409
Reputation: 2457
If you meant NULL column like this:
Can you try changing the column type to FLOAT if possible?
Although this does add a tailing .0 to your user_id, but queries shouldn't be affected by it, unless your user_id is set as a string type.
Upvotes: 0
Reputation: 11105
Replace all np.nan
values with python's None
value, then re-run your code (or try df.to_gbq
):
df = df.where(pd.notnull(df), None)
I'm not experienced with Google BigQuery and I see nothing wrong with your existing code, but it may be worth installing the pandas-gbq
package. Then try to write the DataFrame to GBQ with df.to_gbq
, as detailed in the docs here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_gbq.html
Upvotes: 4