Reputation: 1
I'd like to update a bigquery table from a dataframe with the following datatypes:
datetime64[ns]
timedelta64[ns]
What is the appropriate way to achieve this?
I have a dataframe with the following two columns and data.
Let's call it timestamps_df
timestamp timeInStatus
0 2019-01-02 21:30:20.769 0 days 00:00:00
1 2019-11-04 17:23:59.728 305 days 19:53:38.959000
2 2019-11-04 17:24:03.613 0 days 00:00:03.885000
3 2019-11-04 17:24:07.015 0 days 00:00:03.402000
4 2019-01-08 19:41:31.706 0 days 00:00:00
5 2019-01-21 19:56:05.031 13 days 00:14:33.325000
6 2019-04-19 16:24:49.219 87 days 20:28:44.188000
7 2019-04-19 16:24:51.948 0 days 00:00:02.729000
8 2019-05-03 08:46:47.079 0 days 00:00:00
9 2019-05-03 08:46:50.072 0 days 00:00:02.993000
when I call timestamps_df.dtypes it returns the following
time_stamp_update datetime64[ns]
time_in_status timedelta64[ns]
dtype: object
I'd like to post this to a bigquery table using the Google.cloud library importing bigquery. When I load the table manually with a CSV I see that the autodetected fields in bigquery are listed as:
timestamp = TIMESTAMP timeInStatus = STRING
#sets the table to be updated in bigquery
table_id = "projectName.dataSetName.tableName"
#sets the datatypes for the bigquery table
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("timestamp", bigquery.enums.SqlTypeNames.TIMESTAMP),
bigquery.SchemaField("timeInStatus", bigquery.enums.SqlTypeNames.STRING)
],
# Optionally, set the write disposition. BigQuery appends loaded rows
# to an existing table by default, but with WRITE_TRUNCATE write
# disposition it replaces the table with the loaded data.
write_disposition="WRITE_TRUNCATE",
)
job = client.load_table_from_dataframe(timestamps_df, table_id, job_config=job_config)
job.result() # Wait for the job to complete.
Any ideas? I've tried a few things so far, but I would really like to keep the data looking in the same format that it already is.
Upvotes: 0
Views: 1545
Reputation: 77
There is not such a thing as a “TIMEDELTA” data type in BigQuery, you can refer [1] to inspect the possible ones.
A workaround could be to preprocess your “timeInStatus” column in such a way that fits the “TIME” type [2].
But if you are constrained to maintain the actual format you can use the “STRING” type for BigQuery and use [3] if you need to import the table to a Python environment.
[1] https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
[2] https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time_type
[3] https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_timedelta.html
Upvotes: 1