hellotherebj
hellotherebj

Reputation: 121

EMR Pyspark job, how to import python libraries in s3

We have a main job, let's call it main.py,

in the main.py job, we have other python libraries that is stored in s3. call it test1.py, test2.py

when I am submitting to the spark to run the main.py but it can not pick up test1.py and test2.py. in s3.

How to configure so that it can pick up test1.py and test2.py?

Upvotes: 2

Views: 957

Answers (1)

A.B
A.B

Reputation: 20455

If you have right permissions to access the s3 bucket,you can include them in spark-submit command with --py-files like below

spark-submit --py-files s3a://bucket/you-folder/test1.py,s3a://bucket/you-folder/test2.py main.py

Also, you can use copy step that downloads your file to emr node.

Upvotes: 2

Related Questions