Reputation: 121
We have a main job, let's call it main.py,
in the main.py job, we have other python libraries that is stored in s3. call it test1.py, test2.py
when I am submitting to the spark to run the main.py but it can not pick up test1.py and test2.py. in s3.
How to configure so that it can pick up test1.py and test2.py?
Upvotes: 2
Views: 957
Reputation: 20455
If you have right permissions to access the s3 bucket,you can include them in spark-submit
command with --py-files
like below
spark-submit --py-files s3a://bucket/you-folder/test1.py,s3a://bucket/you-folder/test2.py main.py
Also, you can use copy step that downloads your file to emr node.
Upvotes: 2