bjorndv
bjorndv

Reputation: 623

Submit a PySpark job to a cluster with Spark --jars option

I would like to be able to specify the --jars PySpark submit option when I submit a PySpark job. However this option is not supported. Is there an alternative?

Upvotes: 3

Views: 460

Answers (1)

Dennis Huo
Dennis Huo

Reputation: 10677

Thanks for raising this issue, it appears you discovered a bug where we haven't yet wired out the necessary flag; the intent is indeed to provide a --jars option available in both the console GUI and in gcloud beta dataproc jobs submit pyspark, and we'll hopefully be able to deploy a fix in the next minor release within a few weeks.

In the meantime, you can try simply dumping any jarfile dependencies into /usr/lib/hadoop/lib/ on your master node and/or your worker nodes, possibly using initialization actions to automate downloading the jarfiles at cluster-deployment time, and then it will be available on the classpaths of your Spark (and Hadoop) jobs automatically.

Upvotes: 1

Related Questions