Singh3y
Singh3y

Reputation: 381

Flink Statefun Under Backpressure application crashes

I'm reading data from a kafka topic which has lots of data. Once flink starts reading, it starts up fine and then crashes after some time, when backpressure hits 100%, and then goes in an endless cycle of restarts.

My question is shouldn't flink's backpressure mechanism come into play and reduce consumption from topic till inflight data is consumed or backpressure reduces, as stated in this article: https://www.ververica.com/blog/how-flink-handles-backpressure? Or do i need to give some config which I'm missing? Is there any other solution to avoid this restart cycle when backpressure increases?

I've tried configs

taskmanager.network.memory.buffer-debloat.enabled: true
taskmanager.network.memory.buffer-debloat.samples: 5

My modules.yaml has this config for transportation

spec:
  functions: function_name
  urlPathTemplate: http://nginx:8888/function_name
  maxNumBatchRequests: 50
  transport:
    timeouts:
      call: 2 min
      connect: 2 min
      read: 2 min
      write: 2 min

Upvotes: 0

Views: 393

Answers (1)

David Anderson
David Anderson

Reputation: 43707

You should look in the logs to determine what exactly is causing of the crash and restart, but typically when backpressure is involved in a restart it's because a checkpoint timed out. You could increase the checkpoint timeout.

However, it's better if you can reduce/eliminate the backpressure. One common cause of backpressure is not providing Flink with enough resources to keep up with the load. If this is happening regularly, you should consider scaling up the parallelism. Or it may be that the egress is under-provisioned.

I see you've already tried buffer debloating (which should help). You can also try enabling unaligned checkpoints.

See https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/checkpointing_under_backpressure/ for more information.

Upvotes: 1

Related Questions