Reputation: 61
We right now have an existing running flink job which contains keyed states whose max parallelism is set to 128. As our data grows, we are concerned that 128 is not enough any more in the future. I want to know if we have a way to change the max parallelism by modifying the savepoint? Or is there any way to do that?
Upvotes: 0
Views: 1453
Reputation: 43707
You can use the State Processor API to accomplish this. You will read the state from a savepoint taken from the current job, and write that state into a new savepoint with increased max parallelism. https://nightlies.apache.org/flink/flink-docs-stable/docs/libs/state_processor_api/
Your job should perform well if the maximum parallelism is (roughly) 4-5 times the actual parallelism. When the max parallelism is only somewhat higher than the actual parallelism, then you have some slots processing data from just one key group, and others handling two key groups, and that imbalance wastes resources.
But going unnecessarily high will exact a performance penalty if you are using the heap-based state backend. That's why the default is only 128, and why you don't want to set it to an extremely large value.
Upvotes: 2