Reputation: 1789
I have at my disposal ~100 CPU nodes, each with 192 cores and 1.5TB RAM. I am running some large Spark jobs (each on 40 instances), but I'm really not sure about what's the best way to tune Spark parameters, this is typically what I use that I found on some Spark tutorials:
--conf spark.driver.cores=16 \
--conf spark.driver.memory="256G" \
--conf spark.driver.maxResultSize="16g" \
--conf spark.sql.shuffle.partitions=4000 \
--conf spark.kubernetes.executor.limit.cores=32 \
--conf spark.kubernetes.driver.limit.cores=32 \
--conf spark.executor.cores=15 \
--conf spark.executor.memory="256G" \
The Spark jobs are running successfully, but they take very long and I feel the parameters are not reflecting the hardware I have. Any suggestions?
If I understand correctly, this thread How to tune spark executor number, cores and executor memory? suggests to set only 5 cores per executor, which means I could use 38 executors per node? What I'm also struggling to understand is the relation between driver and executor cores (and also kubernetes cores).
Upvotes: 0
Views: 81
Reputation: 3505
Your current Spark configuration looks like a good starting point,while there is no definite answers as it can vary based on workload ,here are some suggestions you can try and test for tuning your Spark parameters:
Optimizing Your Configuration:
Driver Cores: Consider reducing spark.driver.cores
to 8-16 cores.
Executor Cores: Given your powerful nodes with 192 cores, you can likely increase spark.executor.cores
. Experiment with values between 5 and 15 to find the optimal setting for your workload.
Executor Memory: While you have ample RAM, adjust spark.executor.memory
based on the memory footprint of your tasks. As a starting point, try keeping it around 60-70% of the total node RAM (around 1TB) to leave space for the operating system and other processes. Monitor Spark UI metrics for memory usage and adjust accordingly.
Shuffle Partitions: spark.sql.shuffle.partitions
controls how data is shuffled during some operations. With 100 nodes, 4000 partitions might be a bit low. You can explore increasing it gradually, but be mindful of memory overhead. Use Spark UI shuffle metrics to monitor performance and adjust if needed.
Understanding Driver and Executor Cores:
Driver Cores: The driver controls the overall Spark application. While some cores are needed, setting it too high can steal resources from executors. A value of 8-16 cores is often sufficient.
Executor Cores: Executors are responsible for running tasks that process your data. More cores per executor can improve processing speed, but ensure you don't overcommit resources. Aim for 5-15 cores per executor based on your workload.
Kubernetes Cores: If you're using Kubernetes for resource management, the spark.kubernetes.executor.limit.cores
and spark.kubernetes.driver.limit.core
s configurations specify the maximum cores a container can request. These should be aligned with your executor and driver core allocations, respectively.
Additional Tips:
Monitoring and Tuning: As mentione above you can use Spark UI to track resource utilization (CPU, memory, network), task completion times, and shuffle statistics. These will help you identify bottlenecks and further refine your configuration.
Upvotes: 0