F Baig
F Baig

Reputation: 357

Flink filter before partition

Apache Flink uses DAG style lazy processing model similar to Apache Spark (correct me if I'm wrong). That being said, if I use following code

DataStream<Element> data = ...;
DataStream<Element> res = data.filter(...).keyBy(...).timeWindow(...).apply(...);

.keyBy() converts DataStream to KeyedStream and distributes it among Flink worker nodes.

My question is, how will flink handle filter here? Will filter be applied to incoming DataStream before partitioning/distributing the stream and DataStream will only be created of Element's that pass the filter criteria?

Upvotes: 0

Views: 389

Answers (2)

David Anderson
David Anderson

Reputation: 43439

Will filter be applied to incoming DataStream before partitioning/distributing the stream and DataStream will only be created of Element's that pass the filter criteria?

Yes, that's right. The only thing I might say differently would be to clarify that the original stream data will typically already be distributed (parallel) from the source. The filtering will be applied in parallel, across multiple tasks, after which the keyBy will then reparition/redistribute the stream among the workers.

You can use Flink's web UI to examine a visualization of the execution graph produced from your job.

Upvotes: 1

TobiSH
TobiSH

Reputation: 2921

From my understanding filter is applied before the keyBy. As you said it is a DAG (D == Directed). Do you see any indicators which tells you that this is not the case?

Upvotes: 0

Related Questions