caseyr
caseyr

Reputation: 81

pandas.to_gbq() returning "ArrowTypeError: Expected bytes, got a 'datetime.date' object" error

pandas.to_gbq() has recently started returning an error when I attempt to append a dataframe to a BigQuery table, despite the df schema/data types being the exact same as those of the BigQuery table.

Code snippet below:

df.to_gbq(destination_table = PROCESSED_DATA_TABLE_NAME,
          project_id = PROJECT_NAME,
          if_exists = 'append')

Returns:

  File ~\Documents\DartsModel\update_processed_visit_data\main_dev.py:152 in <module>
    df.to_gbq(destination_table = PROCESSED_DATA_TABLE_NAME,

  File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas\core\frame.py:2054 in to_gbq
    gbq.to_gbq(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas\io\gbq.py:212 in to_gbq
    pandas_gbq.to_gbq(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas_gbq\gbq.py:1198 in to_gbq
    connector.load_data(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas_gbq\gbq.py:591 in load_data
    chunks = load.load_chunks(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas_gbq\load.py:238 in load_chunks
    load_parquet(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\pandas_gbq\load.py:130 in load_parquet
    client.load_table_from_dataframe(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\google\cloud\bigquery\client.py:2628 in load_table_from_dataframe
    _pandas_helpers.dataframe_to_parquet(

  File ~\Anaconda3\envs\darts_model\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py:672 in dataframe_to_parquet
    arrow_table = dataframe_to_arrow(dataframe, bq_schema)

  File ~\Anaconda3\envs\darts_model\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py:617 in dataframe_to_arrow
    bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field)

  File ~\Anaconda3\envs\darts_model\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py:342 in bq_to_arrow_array
    return pyarrow.Array.from_pandas(series, type=arrow_type)

  File pyarrow\array.pxi:1033 in pyarrow.lib.Array.from_pandas

  File pyarrow\array.pxi:312 in pyarrow.lib.array

  File pyarrow\array.pxi:83 in pyarrow.lib._ndarray_to_array

  File pyarrow\error.pxi:123 in pyarrow.lib.check_status

ArrowTypeError: Expected bytes, got a 'datetime.date' object

Relevant package versions below:

python==3.9.12
pandas==1.4.2
pandas-gbq==0.17.6
arrow==1.2.2
google-cloud-bigquery==3.2.0
google-cloud-bigquery-storage==2.13.2

Can't seem to find any solutions online so any help would be greatly appreciated! Thanks.

Upvotes: 8

Views: 15223

Answers (3)

Steve W
Steve W

Reputation: 534

I have seen a similar issue that I can't immediately replicate. However I can describe it. I have Pandas Dataframes with about 500,000 64 bit integers that I have converted to strings. I write them to parquet files using pyarrow. This has always worked fine.

However, recently a new record was created with an ID that began with the four digits 1001. Subsequently, what had always worked broke that day. Either a coincidence or PyArrow saw that 1001 and convinced itself it was receiving bytes, and threw the ArrowTypeError ("Expected bytes, got a 'int' object")

Upvotes: 0

Doracahl
Doracahl

Reputation: 532

One alternative solution to the to_gbq() method is to use google cloud's bigquery package.

While the schema of the bigquery table and the local df are the same, appending to the BigQuery table can be accomplished with the following code:

from google.cloud import bigquery
import pandas as pd

client = bigquery.Client()

# define project, dataset, and table_name variables
project, dataset, table_name = "project", "dataset", "table_name"
table_id = f"{project}.{dataset}.{table_name}"

job_config = bigquery.job.LoadJobConfig()

# set write_disposition parameter as WRITE_APPEND for appending to table
job_config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND

job = client.load_table_from_dataframe(df, table_id, job_config=job_config)

job.result()  # Wait for the job to complete.

table = client.get_table(table_id)  # Make an API request.
print(
    f"Loaded {table.num_rows} rows and {len(table.schema)} columns to {table_id}"
)

The end result yields the same result - appending to a BigQuery table.

Upvotes: 4

Anonymous
Anonymous

Reputation: 1115

One of your columns probably has a type of 'object', but the underlying type is actually a datetime. You can convert it as follows to make to_gbq() work. This relies on the destination type being a 'TIMESTAMP' though.

object_cols = df.select_dtypes(include=['object']).columns

for column in object_cols:
    dtype = str(type(df[column].values[0]))
    if dtype == "<class 'datetime.date'>":
        df[column]  = pd.to_datetime(df[column] , infer_datetime_format=True)

df.to_gbq(destination_table=PROCESSED_DATA_TABLE_NAME, if_exists='replace')

Note, this will replace the table. If you want to append then you need to make sure the types match by using the table_schema parameter in the to_gbq() function.

Upvotes: 4

Related Questions