djnetwork
djnetwork

Reputation: 11

How to dynamically add dependencies to spark executors at runtime

I would like to add archive dependencies to my spark executors in a way that would work similarly to how it functions when passing the archive paths in to the spark-submit with --archives option. However, I will not know what dependencies are required until runtime, so I need to do this programmatically after the spark job has already been submitted.

Is there a way to do this? I'm currently working on a hacky solution where I download the required archives from within the function running on the executors, however this is much slower than having the driver just download the archives once and then distribute them to the executors.

Upvotes: 1

Views: 880

Answers (1)

cruzlorite
cruzlorite

Reputation: 450

Assuming your resource manager is YARN, it is posible to set the property spark.yarn.dist.archives when creating the SparkSession.

SparkSession.builder \
    .appName("myappname") \
    .conf("spark.yarn.dist.archives", "file1.zip#file1,file2.zip#file2,...") \
    .getOrCreate()

More info here: https://spark.apache.org/docs/latest/running-on-yarn.html

You may find the properties spark.yarn.dist.files and spark.yarn.dist.jars useful too.

Upvotes: 3

Related Questions