Animesh Dayal
Animesh Dayal

Reputation: 53

Latency while updating BigQuery schema

I am facing some issues regarding latency in updating BigQuery schema.

I have a table that receives streaming inserts and the schema is updated automatically whenever needed. The issue is that the schema update doesn't seem to take effect for sometime and inserts made in that duration drop the values of the new columns.

I found this answer from 2016 that says that there could be delays of up till 5 minutes before changes take effect.

Is this still the case and how do you work around this? If a timeout is the answer, then how long should you wait before writing to the new columns?

Upvotes: 1

Views: 1134

Answers (1)

Nick_Kh
Nick_Kh

Reputation: 5253

In order to get more meaningful and sense-full information on the subject, I would encourage you to check out this good written article, discovering Bigquery streaming inserts life-cycle, leveraging tabledata.insertAll Bigquery REST API method.

Actually, as documentation says, data Availability and Consistency are the most important requirements for ingesting data in real-time analyzing tasks:

Because BigQuery's streaming API is designed for high insertion rates, modifications to the underlying table metadata exhibit are eventually consistent when interacting with the streaming system. In most cases metadata changes are propagated within minutes, but during this period API responses may reflect the inconsistent state of the table.

Admitting the fact that in some cases where metadata changes are required inline with streaming ingests, the documentation confirms the delay accomplishing this. Even caching mechanism that aims to gather metadata from tables in some circumstances does not guarantee the data changes, i.e. referencing streaming injections to the not existing table or entire columns in the shortest moment. Due to the complexity of GCP Bigquery server-less platform, that originally built on top of Dremel model, it is hardly to estimate the latency time for high throughputs of the particular streaming task, hence this not documented in GCP knowledge base.

Meanwhile, reading this Stack thread, @Sean Chen recommended to afford Bigquery metadata changes beforehand launching streaming ingests.

Upvotes: 1

Related Questions