Force Apache Flink to execute at a given point

Question

It is my understanding that Apache Flink does not actually run the operations that you ask it to until the result of those operations is needed for something. This makes it difficult to time exactly how long each operation takes, which is exactly what i am trying to do in order to compare its efficiency to Apache Spark. Is there a way to force it to run the operations when I want it to?

Till Rohrmann · Accepted Answer

When running a Flink program one defines the topology and operators to be executed on a cluster. One triggers the job execution by calling env.execute where env is either an ExecutionEnvironment or a StreamExecutionEnvironment. There is one exception for batch jobs which are the API calls collect and print which trigger an eager execution.

You could use the web ui to extract the runtime of different operators. For each operator you see when it's deployed and when it finished execution.

Force Apache Flink to execute at a given point

Answers (1)

Related Questions