Reputation: 11
I would like to add archive dependencies to my spark executors in a way that would work similarly to how it functions when passing the archive paths in to the spark-submit with --archives
option. However, I will not know what dependencies are required until runtime, so I need to do this programmatically after the spark job has already been submitted.
Is there a way to do this? I'm currently working on a hacky solution where I download the required archives from within the function running on the executors, however this is much slower than having the driver just download the archives once and then distribute them to the executors.
Upvotes: 1
Views: 880
Reputation: 450
Assuming your resource manager is YARN, it is posible to set the property spark.yarn.dist.archives when creating the SparkSession.
SparkSession.builder \
.appName("myappname") \
.conf("spark.yarn.dist.archives", "file1.zip#file1,file2.zip#file2,...") \
.getOrCreate()
More info here: https://spark.apache.org/docs/latest/running-on-yarn.html
You may find the properties spark.yarn.dist.files and spark.yarn.dist.jars useful too.
Upvotes: 3