bachrc
bachrc

Reputation: 1196

Is global state with multiple workers possible in Flink?

Everywhere in Flink docs I see that a state is individual to a map function and a worker. This seems to be powerful in a standalone approach, but what if Flink runs in a cluster ? Can Flink handle a global state where all workers could add data and query it ?

From Flink article on states :

For high throughput and low latency in this setting, network communications among tasks must be minimized. In Flink, network communication for stream processing only happens along the logical edges in the job’s operator graph (vertically), so that the stream data can be transferred from upstream to downstream operators.

However, there is no communication between the parallel instances of an operator (horizontally). To avoid such network communication, data locality is a key principle in Flink and strongly affects how state is stored and accessed.

Upvotes: 5

Views: 4846

Answers (1)

diegoreico
diegoreico

Reputation: 461

I think that Flink only supports state on operators and state on Keyed streams, if you need some kind of global state, you have to store and recover data into some kind of database/file system/shared memory and mix that data with your stream.

Anyways, in my experiece, with a good processing pipeline design and partitioning your data in the right way, in most cases you should be able to apply divide and conquer algorithms or MapReduce strategies to archive your needs

If you introduce in your system some kind of global state, that global state could be a great bottleneck. So try to avoid it at all cost.

Upvotes: 5

Related Questions