Kenta Kozuka
Kenta Kozuka

Reputation: 47

Google BigQuery: Last modified datetime of a row

I am trying to measure duration of Dataflow pipeline which pulls messages from Pub/Sub and loads them to a BigQuery table. I cannot find how to get the last modified time of a row in BigQuery table though there is a last modified datetime of table. Does anyone know how to set last modified datetime to row of BigQuery table?

Upvotes: 0

Views: 3447

Answers (1)

mremes
mremes

Reputation: 226

You should include the current timestamp in the application that creates the output data structure. That would be the event time in some sense (you can add more granularity by adding event times on the client or on the server depending on how your events originate).

Then you possibly want to record the time before processing (right after the message is read from Pub/Sub). Then you want to record the time right before you write into BigQuery.

You can do both of these with a DoFn as an extra step or include it as the first action in the first transformation and the last action in the last transformation that you have in your pipeline.

Include these new columns respectively to the table schema of the output BigQuery table.

Upvotes: 1

Related Questions