la_femme_it
la_femme_it

Reputation: 672

How can I add external python libraries into HDFS?

Is there any way, how to add external libraries like this one into hdfs? It seems pyspark needs external libs to have them in the shared folder on hdfs. Byt since I am using shellscript, which runs that pyspark script with external libraries, it fails importing them.

See post here about ImportError.

Upvotes: 0

Views: 407

Answers (2)

la_femme_it
la_femme_it

Reputation: 672

We installed the library on all working nodes. We had it only on NameNode.

Upvotes: 0

Bameza
Bameza

Reputation: 207

You can add external lib with the --py-files option. You can provide either a .py file or a .zip.

For exemple, using a spark submit :

spark-submit --master yarn --py-files ./hdfs.zip myJob.py

Check corresponding documentation : Submitting Applications

Upvotes: 2

Related Questions