Alessandro Calmanovici
Alessandro Calmanovici

Reputation: 101

BigQuery does not recognise timezones when parsing

Taken from the official doc:

Time zones are used when parsing timestamps or formatting timestamps for display. The timestamp value itself does not store a specific time zone, nor does it change when you apply a time zone offset.

Time zones are represented by strings in one of these two canonical formats:

  • Offset from Coordinated Universal Time (UTC), or the letter Z for UTC
  • Time zone name from the tz database

Example: 2014-09-27 12:30:00.45 America/Los_Angeles

This is what I am trying to do:

timestamp = dateutil.parser.isoparse(log['timestamp'])
log['local_timestamp'] = timestamp.strftime("%Y-%m-%d %H:%M:%S") + ' Europe/Zurich'

Then, via a ApacheBeam Python Dataflow job, this entry is written into BigQuery and produces this error:

There were errors inserting to BigQuery. Will not retry. Errors were [{'index': 0, 'errors': [{'reason': 'invalid', 'location': 'local_timestamp', 'debugInfo': '', 'message': 'Unrecognized timezone: Europe/Zurich'}]}, {'index': 1, 'errors': [{'reason': 'invalid', 'location': 'local_timestamp', 'debugInfo': '', 'message': 'Unrecognized timezone: Europe/Zurich'}]}]

I have tried different formats, for example appending +2:00 at the end of the timestamp or different locations, even America/Los_Angeles as shown in the example. They all lead to unrecognized timezone errors. Only UTC seems to work.

Am I doing something wrong or the documentation is just not correct and only UTC timestamps are accepted?

Thank you!

Upvotes: 2

Views: 3482

Answers (3)

Nick_Kh
Nick_Kh

Reputation: 5243

I've used the basic code sample from the documentation in order to check tabledata.insertAll method which is relevant for Dataflow streaming to Bigquery functionality, simulating the similar approach for timestamps transformation.

from google.cloud import bigquery
import dateutil, pytz
import dateutil.parser as dt

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of table to append to.
table_id = "your-project.your_dataset.your_table"

t1=dateutil.parser.isoparse('2021-10-20 10:37:24')
t2=str(t1.astimezone(pytz.timezone('Europe/Zurich')))

rows_to_insert = [
    {u"t1": t2}
]

errors = client.insert_rows_json(table_id, rows_to_insert)  # Make an API request.
if errors == []:
    print("New rows have been added.")
    print (t2)
else:
    print("Encountered errors while inserting rows: {}".format(errors))

It executed properly retaining timezone Europe/Zurich.

New rows have been added. 2021-10-20 10:37:24+02:00

As long as Bigquery UI console saves timestamps in UTC format my test record was properly converted and inserted in the target table:

enter image description here

Hope this clarifies my efforts given in the comments.

Upvotes: 1

CarlesCF
CarlesCF

Reputation: 205

If you are using Batch inserts into BigQuery, you may want to check the files that are generated in the temporary Cloud Storage path before being loaded in. With that information you can check if it is a Dataflow issue (most probably, the format generated by your code is not good) or a BigQuery issue (for any strange reason the TZ is not being accepted).

Upvotes: 0

Mr.Batra
Mr.Batra

Reputation: 833

No, BQ accepts other than UTC timezones too.

SELECT
  CURRENT_TIMESTAMP() AS datetime_ymdhms,
  DATETIME(CURRENT_TIMESTAMP(),
    "Europe/Zurich") AS datetime_tstz;

enter image description here

What I guess the problem is ' Europe/Zurich'. Can you try 'Europe/Zurich'?

Upvotes: 0

Related Questions