user14028637
user14028637

Reputation: 1

Importing dataframes with dtypes datetime64[ns] and timedelta64[ns] to google bigquery table

I'd like to update a bigquery table from a dataframe with the following datatypes:

datetime64[ns]
timedelta64[ns]

What is the appropriate way to achieve this?

I have a dataframe with the following two columns and data.

Let's call it timestamps_df

   timestamp                  timeInStatus
0   2019-01-02 21:30:20.769     0 days 00:00:00
1   2019-11-04 17:23:59.728     305 days 19:53:38.959000
2   2019-11-04 17:24:03.613     0 days 00:00:03.885000
3   2019-11-04 17:24:07.015     0 days 00:00:03.402000
4   2019-01-08 19:41:31.706     0 days 00:00:00
5   2019-01-21 19:56:05.031     13 days 00:14:33.325000
6   2019-04-19 16:24:49.219     87 days 20:28:44.188000
7   2019-04-19 16:24:51.948     0 days 00:00:02.729000
8   2019-05-03 08:46:47.079     0 days 00:00:00
9   2019-05-03 08:46:50.072     0 days 00:00:02.993000

when I call timestamps_df.dtypes it returns the following

time_stamp_update     datetime64[ns]
time_in_status       timedelta64[ns]
dtype: object

I'd like to post this to a bigquery table using the Google.cloud library importing bigquery. When I load the table manually with a CSV I see that the autodetected fields in bigquery are listed as:

timestamp = TIMESTAMP timeInStatus = STRING

#sets the table to be updated in bigquery
table_id = "projectName.dataSetName.tableName"


#sets the datatypes for the bigquery table
job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("timestamp", bigquery.enums.SqlTypeNames.TIMESTAMP),
        bigquery.SchemaField("timeInStatus", bigquery.enums.SqlTypeNames.STRING)
    ],
    # Optionally, set the write disposition. BigQuery appends loaded rows
    # to an existing table by default, but with WRITE_TRUNCATE write
    # disposition it replaces the table with the loaded data.
    write_disposition="WRITE_TRUNCATE",
)

job = client.load_table_from_dataframe(timestamps_df, table_id, job_config=job_config)
job.result()  # Wait for the job to complete.

Any ideas? I've tried a few things so far, but I would really like to keep the data looking in the same format that it already is.

Upvotes: 0

Views: 1545

Answers (1)

Sarrión
Sarrión

Reputation: 77

There is not such a thing as a “TIMEDELTA” data type in BigQuery, you can refer [1] to inspect the possible ones.

A workaround could be to preprocess your “timeInStatus” column in such a way that fits the “TIME” type [2].

But if you are constrained to maintain the actual format you can use the “STRING” type for BigQuery and use [3] if you need to import the table to a Python environment.

[1] https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types

[2] https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time_type

[3] https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_timedelta.html

Upvotes: 1

Related Questions