I running my flink app with 16 parallelism. after 20 minutes shared checkpoint increase to 235MB. how i can i handle it. it's very large in long time. Every task manager is a Openshift Pod Task managers: 4 Tasks per task manager: 4 CPU per task manager: 4 Core Memory per task manager: 6GB Used Rocksdb state bakend Enabled incremental checkpoint Below image is for a task manager(Pod)

Reputation: 1388

Checkpoint Shared is Very Large

I running my flink app with 16 parallelism. after 20 minutes shared checkpoint increase to 235MB. how i can i handle it. it's very large in long time.

Every task manager is a Openshift Pod
Task managers: 4
Tasks per task manager: 4
CPU per task manager: 4 Core
Memory per task manager: 6GB
Used Rocksdb state bakend
Enabled incremental checkpoint
Below image is for a task manager(Pod)

Upvotes: 0

Answers (1)

David Anderson

Reputation: 43697

Flink will use only as much space for state as is required to do what you've asked it to do. If you are unhappy with the result, you need to somehow ask it to do less.

Here some things you might do:

Make sure your application isn't leaking state. This can happen, for example, if you are using keyed state with an unbounded key space, and aren't clearing the state.
Establish a state retention interval (for the Table/SQL API).
Use State TTL to free unneeded state.

There are certain anti-patterns that require a lot of buffering in state. You should avoid those. :)

You could restrict the resources available for storing state, but this will result in the job failing when those resources are exhausted.

Also, 235MB across 16 slots isn't very large for RocksDB. With incremental checkpointing, RocksDB is storing multiple (uncompacted) copies of the state. The actual active state you're using could be much less.

Upvotes: 1

Checkpoint Shared is Very Large

Answers (1)

Related Questions