Reputation: 357
Apache Flink uses DAG style lazy processing model similar to Apache Spark (correct me if I'm wrong). That being said, if I use following code
DataStream<Element> data = ...;
DataStream<Element> res = data.filter(...).keyBy(...).timeWindow(...).apply(...);
.keyBy()
converts DataStream
to KeyedStream
and distributes it among Flink worker nodes.
My question is, how will flink handle filter
here? Will filter be applied to incoming DataStream
before partitioning/distributing the stream and DataStream
will only be created of Element
's that pass the filter criteria?
Upvotes: 0
Views: 389
Reputation: 43439
Will filter be applied to incoming DataStream before partitioning/distributing the stream and DataStream will only be created of Element's that pass the filter criteria?
Yes, that's right. The only thing I might say differently would be to clarify that the original stream data
will typically already be distributed (parallel) from the source. The filtering will be applied in parallel, across multiple tasks, after which the keyBy will then reparition/redistribute the stream among the workers.
You can use Flink's web UI to examine a visualization of the execution graph produced from your job.
Upvotes: 1
Reputation: 2921
From my understanding filter
is applied before the keyBy
. As you said it is a DAG (D == Directed). Do you see any indicators which tells you that this is not the case?
Upvotes: 0