Reputation: 686
Hey there I am having a hard time understanding how shared state (ValueState, ListState, ..) work in flink. If multiple instances of a task are running in parallel how does flink prevent race conditions?
in this example from the doc, if the operator is parallelized, how does flink guarantee that there are no race conditions between the read and update of the keyHasBeenSeen value?
public static class Deduplicator extends RichFlatMapFunction<Event, Event> {
ValueState<Boolean> keyHasBeenSeen;
@Override
public void open(Configuration conf) {
ValueStateDescriptor<Boolean> desc = new ValueStateDescriptor<>("keyHasBeenSeen", Types.BOOLEAN);
keyHasBeenSeen = getRuntimeContext().getState(desc);
}
@Override
public void flatMap(Event event, Collector<Event> out) throws Exception {
if (keyHasBeenSeen.value() == null) {
out.collect(event);
keyHasBeenSeen.update(true);
}
}
}
Upvotes: 0
Views: 216
Reputation: 43707
There isn't any shared state in Flink. Having shared state would add complexity and impair scalability.
The value
and update
methods are scoped to the key of the current event. For any given key, all events for that key are processed by the same instance of the operator/function. And all tasks (a task is a chain of operator/function instances) are single threaded.
By keeping things simple like this, there's nothing to worry about.
Upvotes: 1