Reputation: 80176
I have a spark job running for 12 hours, and it has been cancelled. There are approximately 400,000 short-lived tasks (the maximum task duration is 26 seconds). Following is the Executors tab. I am running out of ideas on whether there are opportunities to reduce the overall processing time. What am I missing?
NOTE: I can add more executors, but I am wondering if there are other things I can do before I increase the executors. There are also no errors in the log!
Upvotes: 0
Views: 106
Reputation: 190
You may try setting 5 cores per executor instead of 8 cores for maximum HDFS throughput by --conf spark.executor.cores=5
and increasing the number of executors to match the total number of cores at your desired value. In general, it's recommended that 2-3 tasks (partitions) per CPU core in your cluster. That means if you have 3 executors with 5 cores each, you should ideally have around 30 to 45 tasks or partitions. Since the input size is around 6.8 GiB, the partition size should be around 154 to 232 MB. So you may try increasing spark.sql.files.maxPartitionBytes
to 256MB to leverage both parallelism and efficiency. I also recommend that you increase the driver memory so that it can manage garbage collection more easily and in case you are performing an action such as show or collect.
Upvotes: 0