PanJ
PanJ

Reputation: 78

Apache Beam Python's WriteToBigtable sometimes causes a step to keep running infinitely on Dataflow

I am using GCP's Dataflow to process data and write to BigTable. There will be some jobs that will be run infinitely due to the WriteToBigtable process. The chance of the step running infinitely is higher as the number rows to be written gets larger.

Inspecting deeply in to the logs, I found that there is a the difference between finishing jobs and non-finishing jobs as follows –

Unable to perform SDK-split for work-id: 5193980908353266575 due to error: INTERNAL: Empty split returned. [type.googleapis.com/util.MessageSetPayload='[dist_proc.dax.internal.TrailProto] { trail_point { source_file_loc { filepath: "dist_proc/dax/workflow/worker/fnapi_operators.cc" line: 2738 } } }']
=== Source Location Trace: ===
dist_proc/dax/internal/status_utils.cc:236
 And could not Checkpoint reader due to error: OUT_OF_RANGE: Cannot checkpoint when range tracker is finished. [type.googleapis.com/util.MessageSetPayload='[dist_proc.dax.internal.TrailProto] { trail_point { source_file_loc { filepath: "dist_proc/dax/workflow/worker/operator.cc" line: 340 } } }']
=== Source Location Trace: ===
dist_proc/dax/io/dax_reader_driver.cc:253
dist_proc/dax/workflow/worker/operator.cc:340

There used to be a workaround to this problem, setting the flag use_cross_language=True to use Java SDK. But it is recently disabled on GCP Dataflow (This workaround is used to be working previously) enter image description here

Also, I saw that the 2.51.0 is just supported on Dataflow. I'll try this version to see if the issue is resolved.

Upvotes: 1

Views: 223

Answers (0)

Related Questions