Reputation: 472
I am running into some problems in (Py)Spark on EMR (release 5.32.0). Approximately a year ago I ran the same program on an EMR cluster (I think the release must have been 5.29.0). Then I was able to configure my PySpark program using spark-submit
arguments properly. However, now I am running the same/similar code, but the spark-submit
arguments do not seem to have any effect.
My cluster configuration:
I run the program with the following spark-submit
arguments:
spark-submit --master yarn --conf "spark.executor.cores=3" --conf "spark.executor.instances=40" --conf "spark.executor.memory=8g" --conf "spark.driver.memory=8g" --conf "spark.driver.maxResultSize=8g" --conf "spark.dynamicAllocation.enabled=false" --conf "spark.default.parallelism=480" update_from_text_context.py
I did not change anything in the default configurations on the cluster.
Below a screenshot of the Spark UI, which is indicating only 10 executors, whereas I expect to have 40 executors available...
I tried different spark-submit
arguments in order to make sure that the error was unrelated to Apache Spark: setting executor instances does not change the executors. I tried a lot of things, and nothing seems to help.
I am a little lost here, could someone help?
UPDATE:
I ran the same code on EMR release label 5.29.0, and there the conf settings in the spark-submit
argument seems to have effect:
Why is this happening?
Upvotes: 6
Views: 3159
Reputation: 1980
Sorry for the confusion, but this is intentional. On emr-5.32.0, Spark+YARN will coalesce multiple executor requests that land on the same node into a larger executor container. Note how even though you had fewer executors than you expected, each of them had more memory and cores that you had specified. (There's one asterisk here, though, that I'll explain below.)
This feature is intended to provide better performance by default in most cases. If you would really prefer to keep the previous behavior, you may disable this new feature by setting spark.yarn.heterogeneousExecutors.enabled=false, though we (I am on the EMR team) would like to hear from you about why the previous behavior is preferable.
One thing that doesn't make sense to me, though, is that you should be ending up with the same total number of executor cores that you would have without this feature, but that doesn't seem to have occurred for the example you shared. You asked for 40 executors with 3 cores each but then got 10 executors with 15 cores each, which is a bit more in total. This may have to do with the way that your requested spark.executor.memory of 8g divides into the memory available on your chosen instance type, which I'm guessing is probably m5.4xlarge. One thing that may help you is to remove all of your overrides for spark.executor.memory/cores/instances and just use the defaults. Our hope is that defaults will give the best performance in most cases. If not, like I said above, please let us know so that we can improve further!
Upvotes: 7
Reputation: 472
Ok, if someone is facing the same problem. As a workaround you can just revert back to a previous version of EMR. In my example I reverted back to EMR release label 5.29.0, which solved all my problems. Suddenly I was able to configure the Spark job again!
Still I am not sure why it doesn't work in EMR release label 5.32.0. So if someone has suggestions, please let me know!
Upvotes: -1