Reputation: 381
I'm reading data from a kafka topic which has lots of data. Once flink starts reading, it starts up fine and then crashes after some time, when backpressure hits 100%, and then goes in an endless cycle of restarts.
My question is shouldn't flink's backpressure mechanism come into play and reduce consumption from topic till inflight data is consumed or backpressure reduces, as stated in this article: https://www.ververica.com/blog/how-flink-handles-backpressure? Or do i need to give some config which I'm missing? Is there any other solution to avoid this restart cycle when backpressure increases?
I've tried configs
taskmanager.network.memory.buffer-debloat.enabled: true
taskmanager.network.memory.buffer-debloat.samples: 5
My modules.yaml has this config for transportation
spec:
functions: function_name
urlPathTemplate: http://nginx:8888/function_name
maxNumBatchRequests: 50
transport:
timeouts:
call: 2 min
connect: 2 min
read: 2 min
write: 2 min
Upvotes: 0
Views: 393
Reputation: 43707
You should look in the logs to determine what exactly is causing of the crash and restart, but typically when backpressure is involved in a restart it's because a checkpoint timed out. You could increase the checkpoint timeout.
However, it's better if you can reduce/eliminate the backpressure. One common cause of backpressure is not providing Flink with enough resources to keep up with the load. If this is happening regularly, you should consider scaling up the parallelism. Or it may be that the egress is under-provisioned.
I see you've already tried buffer debloating (which should help). You can also try enabling unaligned checkpoints.
See https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/checkpointing_under_backpressure/ for more information.
Upvotes: 1