Reputation: 868
It is my understanding that Apache Flink does not actually run the operations that you ask it to until the result of those operations is needed for something. This makes it difficult to time exactly how long each operation takes, which is exactly what i am trying to do in order to compare its efficiency to Apache Spark. Is there a way to force it to run the operations when I want it to?
Upvotes: 0
Views: 95
Reputation: 13346
When running a Flink program one defines the topology and operators to be executed on a cluster. One triggers the job execution by calling env.execute
where env
is either an ExecutionEnvironment
or a StreamExecutionEnvironment
. There is one exception for batch jobs which are the API calls collect
and print
which trigger an eager execution.
You could use the web ui to extract the runtime of different operators. For each operator you see when it's deployed and when it finished execution.
Upvotes: 1