Reputation: 2519
We are consistently observing this behavior with interactive spark jobs in spark-shell or running Sparklyr in RStudio etc.
Say I launched spark-shell in yarn-client mode and performed an action, which triggered several stages in a job and consumed x cores and y MB memory. Once this job finishes, and the corresponding spark session is still active, the allocated cores & memory is not released even though that job is finished. Is this normal behavior?
Until the corresponding spark session is finished, the ip:8088/ws/v1/cluster/apps/application_1536663543320_0040/ kept showing: y x z
I would assume, Yarn would dynamically allocate these unused resources to other spark jobs which are awaiting resources. Please clarify if I am missing something here.
Upvotes: 2
Views: 2697
Reputation: 2214
You need to play with configs around dynamic allocation https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation -
spark.dynamicAllocation.executorIdleTimeout
to a smaller value say 10s. Default value of this parameter is 60s. This config tells spark that it should release the executor only when it is idle for this much time.spark.dynamicAllocation.initialExecutors
/spark.dynamicAllocation.minExecutors
. Set these to a small number - say 1/2. The spark application will never downscale below this number unless the SparkSession is closed.Once you set these two configs, your application should release the extra executors once they are idle for 10 seconds.
Upvotes: 1
Reputation: 668
Yes the resources are allocated until the SparkSession is active. To handle this better you can use dynamic allocation.
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-dynamic-allocation.html
Upvotes: 0