Reputation: 623
I would like to be able to specify the --jars PySpark submit option when I submit a PySpark job. However this option is not supported. Is there an alternative?
Upvotes: 3
Views: 460
Reputation: 10677
Thanks for raising this issue, it appears you discovered a bug where we haven't yet wired out the necessary flag; the intent is indeed to provide a --jars
option available in both the console GUI and in gcloud beta dataproc jobs submit pyspark
, and we'll hopefully be able to deploy a fix in the next minor release within a few weeks.
In the meantime, you can try simply dumping any jarfile dependencies into /usr/lib/hadoop/lib/
on your master node and/or your worker nodes, possibly using initialization actions to automate downloading the jarfiles at cluster-deployment time, and then it will be available on the classpaths of your Spark (and Hadoop) jobs automatically.
Upvotes: 1