Reputation: 158
How do you add the time zone into a dataframe timestamp that you upload to Bigquery (BQ)?
When uploading a dataframe that includes timestamps to Google BQ you need to specify the UTC offset or the time zone name to make sure you keep the correct timezone. If you don't, BQ will automatically convert your timestamp to UTC.
The timestamp format that BQ can read has the fallowing canonical format: YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]][time zone]
My question is about setting the [time zone]
part correctly. According to BQ documentation:
A string-formatted timestamp may include a time zone. When a time zone is not explicitly specified, the default time zone, UTC, is used. Time zones are represented by strings in one of these two canonical formats:
I have a dataframe which includes timestamps that have been converted to a local time zone with the below command:
df['date'].dt.tz_localize('UTC').dt.tz_convert('Europe/Berlin')
The outcomes look like this and includes a UTC offset:
2019-02-15 00:02:26+01:00
and
2019-04-01 00:03:40+02:00
(Daylight saving time)
However, when uploading this into BQ through the API, the values are converted to UTC. And I receive the following values.
2019-02-15 00:01:26 UTC
and
2019-04-01 00:01:40 UTC
What could be the case for BigQuery not accepting my Timestamp formatting?
I load the job with these lines of code:
dataset_ref = client.dataset('dataset_name')
table_ref = dataset_ref.table('table_name')
job_config = bigquery.LoadJobConfig()
job_config.autodetect = True
job_config.write_disposition = 'WRITE_TRUNCATE'
client.load_table_from_dataframe(df, table_ref, job_config=job_config).result()
Upvotes: 1
Views: 6143
Reputation: 33705
BigQuery does not store the time zone alongside TIMESTAMP values. When you query a TIMESTAMP column, you need to pass a time zone if you want the results to reflect it, e.g. STRING (timestamp, '+02:00')
. One option is to store the time zone name (or hour offset from UTC) in a separate column. When you want the query results to use that original time zone, you can use that column, e.g. STRING(timestamp, time_zone)
.
Upvotes: 3