Efficiently Cleaning Broadcast State in Flink

Question

Currently have a Flink job that is using the broadcast state pattern, connecting the broadcasted stream to an event stream to provide context for decision making. The broadcasted data contains relatively small objects and is a medium throughput stream.

Since broadcasted state can only be manipulated on the broadcast side, the current method for cleaning this data to prevent it from infinitely growing in size (since it is stored in memory) is to iterate through the entire state in the processBroadcastElement method, removing items that meet certain qualifications based on specific object properties.

For normal operation, this works fine and does not seem to strain processing power. However, there have been a couple of situations where the state needs to be cleared and backloaded with hundreds of thousands of broadcast stream objects (currently adding up to about 15 MB * 2 parallel instances in checkpoint size). In these situations, the job immediately becomes 100% busy on the Co-Process-Broadcast operator, providing 100% backpressure on both data sources.

A few potential solutions that might be better:

Transition to a MapState for the currently broadcasted data and a keyed event stream so I can access the state in a rich map function and clean the state there if needed
Transition to a MapState for the currently broadcasted data and a keyed event stream so I can access the state in a rich map function and clean the full state on a timer interval in a keyed process function on the event stream
Implement something similar to the connected streams example here: https://nightlies.apache.org/flink/flink-docs-stable/docs/learn-flink/etl/#connected-streams, keying both streams off of the same id and saving the broadcast data in a MapState for use when the event stream object arrives

Looking for feedback on which option is the most "correct" pattern to use for this situation, along with which would be most performant.

Efficiently Cleaning Broadcast State in Flink

Answers (0)

Related Questions