Pierre
Pierre

Reputation: 686

Flink Shared State and race conditions

Hey there I am having a hard time understanding how shared state (ValueState, ListState, ..) work in flink. If multiple instances of a task are running in parallel how does flink prevent race conditions?

in this example from the doc, if the operator is parallelized, how does flink guarantee that there are no race conditions between the read and update of the keyHasBeenSeen value?

public static class Deduplicator extends RichFlatMapFunction<Event, Event> {
    ValueState<Boolean> keyHasBeenSeen;

    @Override
    public void open(Configuration conf) {
        ValueStateDescriptor<Boolean> desc = new ValueStateDescriptor<>("keyHasBeenSeen", Types.BOOLEAN);
        keyHasBeenSeen = getRuntimeContext().getState(desc);
    }

    @Override
    public void flatMap(Event event, Collector<Event> out) throws Exception {
        if (keyHasBeenSeen.value() == null) {
            out.collect(event);
            keyHasBeenSeen.update(true);
        }
    }
}

Upvotes: 0

Views: 216

Answers (1)

David Anderson
David Anderson

Reputation: 43707

There isn't any shared state in Flink. Having shared state would add complexity and impair scalability.

The value and update methods are scoped to the key of the current event. For any given key, all events for that key are processed by the same instance of the operator/function. And all tasks (a task is a chain of operator/function instances) are single threaded.

By keeping things simple like this, there's nothing to worry about.

Upvotes: 1

Related Questions