Reputation: 23
I have run my Flink program (in Scala) both in my IDE (Intellij) and standalone cluster. In my program, I print out the running time. I got 20s when running in IDE and 74s when running in standalone cluster. I am very confused why it takes so much times running in a cluster with 10 parallelism. I am trying to compare Flink performance with Spark basically. Can someone help me to understand how can it happen ? Thank you.
Added :
Sample of my program can be found here. Time that is printed in the console for this particular code is as below:
Config for Flink standalone cluster that I've changed:
Run flink jar : flink run --class flinkutils.generated.Test2Agg2Spark ./target/scala-2.12/executorflink_2.12-0.1.jar
Upvotes: 0
Views: 219
Reputation: 43717
One factor affecting the performance is that when run in the IDE everything is running within a single JVM, and data is shipped around in memory. Whereas with the cluster, the data is going through the TCP stack.
But this is a complex scenario, and many other factors may also be negatively impacting performance.
FWIW, Flink SQL gets good performance on the TPC-H benchmark (if properly configured).
Upvotes: 1