Reputation: 78
I am using GCP's Dataflow to process data and write to BigTable. There will be some jobs that will be run infinitely due to the WriteToBigtable
process. The chance of the step running infinitely is higher as the number rows to be written gets larger.
Inspecting deeply in to the logs, I found that there is a the difference between finishing jobs and non-finishing jobs as follows –
Unable to perform SDK-split for work-id: 5193980908353266575 due to error: INTERNAL: Empty split returned. [type.googleapis.com/util.MessageSetPayload='[dist_proc.dax.internal.TrailProto] { trail_point { source_file_loc { filepath: "dist_proc/dax/workflow/worker/fnapi_operators.cc" line: 2738 } } }']
=== Source Location Trace: ===
dist_proc/dax/internal/status_utils.cc:236
And could not Checkpoint reader due to error: OUT_OF_RANGE: Cannot checkpoint when range tracker is finished. [type.googleapis.com/util.MessageSetPayload='[dist_proc.dax.internal.TrailProto] { trail_point { source_file_loc { filepath: "dist_proc/dax/workflow/worker/operator.cc" line: 340 } } }']
=== Source Location Trace: ===
dist_proc/dax/io/dax_reader_driver.cc:253
dist_proc/dax/workflow/worker/operator.cc:340
There used to be a workaround to this problem, setting the flag use_cross_language=True
to use Java SDK. But it is recently disabled on GCP Dataflow (This workaround is used to be working previously)
Also, I saw that the 2.51.0
is just supported on Dataflow. I'll try this version to see if the issue is resolved.
Upvotes: 1
Views: 223