Reputation: 3845
I am building a Python script executed using spark-submit command to retrieve data from MongoDB collection and process fetched data to generate analytics. I am utilizing MongoDB Spark connector to query a MongoDB collection using --packages
option.
But I need to configure package into Apache Spark and execute Python script using spark submit command without --packages
option.
Upvotes: 0
Views: 706
Reputation: 2794
From http://spark.apache.org/docs/latest/submitting-applications.html:
For Python, you can use the --py-files argument of spark-submit to add .py, .zip or .egg files to be distributed with your application. If you depend on multiple Python files we recommend packaging them into a .zip or .egg.
So you could write your own layer of data loading logic. However, using a ready-made package has lots of advantages. Maybe you could explain why you cannot use --packages
?
EDIT
Based on the chat, the only reason PO couldn't use --packages
is his jar
for mongodb is stored locally (and of course not in $PATH
). In this case, providing --repositories /PATH/TO/JAR
should fix the problem.
Upvotes: 1