Johannes Bauer
Johannes Bauer

Reputation: 528

Schema conflict when storing dataframes with datetime objects using load_table_from_dataframe()

I'm trying to load data from a Pandas DataFrames into a BigQuery table. The DataFrame has a column of dtype datetime64[ns], and when I try to store the df using load_table_from_dataframe(), I get

google.api_core.exceptions.BadRequest: 400 Provided Schema does not match Table [table name]. Field computation_triggered_time has changed type from DATETIME to TIMESTAMP.

The table has a schema which reads

CREATE TABLE `[table name]` (
  ...
  computation_triggered_time    DATETIME  NOT NULL,
  ...
)

In the DataFrame, computation_triggered_time is a datetime64[ns] column. When I read the original DataFrame from CSV, I convert it from text to datetime like so:

df['computation_triggered_time'] = \ 
  df.to_datetime(df['computation_triggered_time']).values.astype('datetime64[ms]')

Note:

The .values.astype('datetime64[ms]') part is necessary because load_table_from_dataframe() uses PyArrow to serialize the df and that fails if the data has nanosecond-precision. The error is something like

[...] Casting from timestamp[ns] to timestamp[ms] would lose data

Upvotes: 1

Views: 1622

Answers (1)

Wes McKinney
Wes McKinney

Reputation: 105611

This looks like a problem with Google's google-cloud-python package, can you report the bug there? https://github.com/googleapis/google-cloud-python

Upvotes: 1

Related Questions