Ahmad.S
Ahmad.S

Reputation: 809

What is the best way of sharing a dataset between the nodes in Apache flink?

iI am using Apache Flink to process a stream of data and I need to share an index between all the nodes that process the input data. The index is getting updated by the nodes frequently.

I would like to know, is it a good practice, from the point of efficiency, to share the Dataset through Broadcast Variables?

Is broadcast variable will be updated in all nodes after each update or not?

Does Apache Flink intelligently update broadcast variables incrementally just for recent changes or not?

Upvotes: 5

Views: 565

Answers (1)

Eron Wright
Eron Wright

Reputation: 1060

I think the solution lies in using stateful functions based on Flink's managed state descriptors. If the state isn't partitionable, set the parallelism to one for your operator.

Upvotes: 1

Related Questions