Reputation: 672
Is there any way, how to add external libraries like this one into hdfs? It seems pyspark needs external libs to have them in the shared folder on hdfs. Byt since I am using shellscript, which runs that pyspark script with external libraries, it fails importing them.
See post here about ImportError.
Upvotes: 0
Views: 407
Reputation: 672
We installed the library on all working nodes. We had it only on NameNode.
Upvotes: 0
Reputation: 207
You can add external lib with the --py-files
option. You can provide either a .py file or a .zip.
For exemple, using a spark submit :
spark-submit --master yarn --py-files ./hdfs.zip myJob.py
Check corresponding documentation : Submitting Applications
Upvotes: 2