Roman
Roman

Reputation: 97

Error with inserting data into BigQuery table

I try to insert data from pandas dataframe into GBQ table and got an "invalid data error". GBQ table has following schema:

Field name Type Mode
id STRING REQUIRED
order_id STRING REQUIRED
action_date DATE NULLABLE
product_name STRING NULLABLE
order_sum FLOAT NULLABLE
website_id STRING NULLABLE
website_name STRING NULLABLE
webmaster_id STRING NULLABLE
webmaster STRING NULLABLE

and dataFrame has such structure:

id order_id action_date product_name order_sum website_id website_name webmaster_id webmaster_name
830339411 970561 2022-02-25 product_1 1000.0 123 site 1 456 webmaster 1
830339412 970562 2022-02-25 product_2 1500.0 120 site 2 456 webmaster 1

and dtypes:

column type
id object
order_id object
action_date object
product_name object
order_sum float64
website_id object
website_name object
webmaster_id object
webmaster object

originally action_date column had a format like 2022-02-25T20:31:02 but I have transformed it to 2022-02-25 with

all_orders['action_time'] = pd.to_datetime(df['action_time'])
all_orders['action_date'] = all_orders['action_time'].dt.date

so when I try to insert some rows to GBQ I get an error:

'errors': [{'reason': 'invalid', 'location': 'action_date', 'debugInfo': '', 'message': "Invalid date: '1644019200000'"}]

It looks like GBQ consider a date in action_date column as unix timestamp. How to fix it?

Upvotes: 1

Views: 1875

Answers (1)

ewertonvsilva
ewertonvsilva

Reputation: 1955

You don't need to parse the 2022-02-25T20:31:02 format before write, use the following and write to the table.

dataframe['action_time'] = pandas.to_datetime(dataframe['action_time'], infer_datetime_format=True)

Here is an full functional example on writing dates to BigQuery:

import datetime
from google.cloud import bigquery
import pandas
import pytz

client = bigquery.Client()

table_id = "<project>.<ds>.<table>"

records = [
    {
        "date": '2022-12-25T20:31:02',
        "data": "Final Teste",
    },
]
dataframe = pandas.DataFrame(
    records,
    columns=[
        "date",
        "data",
    ],
)


dataframe['date'] = pandas.to_datetime(dataframe['date'], infer_datetime_format=True)

job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("date", bigquery.enums.SqlTypeNames.DATE),
        bigquery.SchemaField("data", bigquery.enums.SqlTypeNames.STRING),
    ],)

job = client.load_table_from_dataframe(dataframe, table_id) 
job.result()  

table = client.get_table(table_id)  # Make an API request.
print("Loaded {} rows and {} columns to {}".format(table.num_rows, len(table.schema), table_id))

result on table: enter image description here

Upvotes: 1

Related Questions