sclee1
sclee1

Reputation: 1281

State size more than 500Gb in Flink

I am wondering that it is okay to use in case where state size is more than 500Gb in the state backend in Flink. RocksDB can handle the data more than memory size, but searching the data in huge size is tremendous I think. I know that caching some data in RocksDB can reduce the execution time, but wondering that such a scenario that storing large data in state exist in use case in Flink.

My scenario looks like as below.

I would use the MapState for state. My data stored in the state will be key-value pair. And each incoming records will be preprocessed by using config value and then will be appended to the state.

Thanks.

Upvotes: 0

Views: 535

Answers (1)

kkrugler
kkrugler

Reputation: 9245

If you can use incremental checkpointing, and if the amount of state that changes per checkpoint interval is something significantly less than 500Gb, then yes that can work.

With the RocksDB state backend, MapState is especially efficient, as changing/adding/deleting an entry doesn't impact any other state in the map.

Upvotes: 1

Related Questions