Flink filter before partition

Question

Apache Flink uses DAG style lazy processing model similar to Apache Spark (correct me if I'm wrong). That being said, if I use following code

DataStream data = ...;
DataStream res = data.filter(...).keyBy(...).timeWindow(...).apply(...);

.keyBy() converts DataStream to KeyedStream and distributes it among Flink worker nodes.

My question is, how will flink handle filter here? Will filter be applied to incoming DataStream before partitioning/distributing the stream and DataStream will only be created of Element's that pass the filter criteria?

David Anderson · Accepted Answer

Will filter be applied to incoming DataStream before partitioning/distributing the stream and DataStream will only be created of Element's that pass the filter criteria?

Yes, that's right. The only thing I might say differently would be to clarify that the original stream data will typically already be distributed (parallel) from the source. The filtering will be applied in parallel, across multiple tasks, after which the keyBy will then reparition/redistribute the stream among the workers.

You can use Flink's web UI to examine a visualization of the execution graph produced from your job.

Flink filter before partition

Answers (2)

Related Questions