Reputation: 3328
I have had a successful pipeline running streaming data from pub/sub
into bigquery
using cloud dataflow
which is running on a compute instance, rather than an actual dataflow
runner.
Today I have updated the BQ table schema, and no new inserts seem to occur. I can view logs on the machine and all is fine - dataflow
is not reporting any errors.
Is there any way to access streaming logs from bigquery
to check for errors.
EDIT: To summarise my question is whether I am able to get some more verbose logging either from the apache beam SDK or from bigquery to see where this data is ending up.
I have had a look in stackdriver
, but this does not seem to create entries for streaming logs.
Upvotes: 2
Views: 1467
Reputation: 11021
In versions 2.15 and 2.16, Beam now produces a deadletter PCollection containing all of the rows that failed to be inserted.
This setting is configurable, with the insert_retry_policy
parameter. The default for 2.15 and 2.16 is RETRY_ON_TRANSIENT_ERRORS
. Starting on 2.17, the default will be RETRY_ALWAYS
.
You would do the following:
result = my_collection | WriteToBigQuery(...,
method='STREAMING_INSERTS', ...)
failed_rows = result['FailedRows'] # You can consume this PCollection.
You may also choose to always retry:
result = my_collection | WriteToBigQuery(...,
insert_retry_policy='RETRY_ALWAYS',
method='STREAMING_INSERTS', ...)
This will cause that nothing is output to failed_rows
, and your pipeline may rnu forever.
Upvotes: 2
Reputation: 176
you should be able to get your data stream logs from BigQuery, please take a look on this docs[1][2]. Be aware that modifying the schema of a table can take several minutes to propagate changes, and if it has recently received streaming inserts may respond with schema mismatch errors.
In this case,when BigQuery encounters a schema mismatch on individual rows in the request, none of the rows are inserted and an insertErrors entry is returned for each row including detailed information about the schema mismatch.
[1]https://cloud.google.com/bigquery/troubleshooting-errors#streaming [2]https://cloud.google.com/bigquery/docs/reference/auditlogs/#mapping_audit_entries_to_log_streams
Upvotes: 0