Rubin Porwal
Rubin Porwal

Reputation: 3845

How to configure an external package into Apache Spark?

I am building a Python script executed using spark-submit command to retrieve data from MongoDB collection and process fetched data to generate analytics. I am utilizing MongoDB Spark connector to query a MongoDB collection using --packages option.

But I need to configure package into Apache Spark and execute Python script using spark submit command without --packages option.

Upvotes: 0

Views: 706

Answers (1)

shuaiyuancn
shuaiyuancn

Reputation: 2794

From http://spark.apache.org/docs/latest/submitting-applications.html:

For Python, you can use the --py-files argument of spark-submit to add .py, .zip or .egg files to be distributed with your application. If you depend on multiple Python files we recommend packaging them into a .zip or .egg.

So you could write your own layer of data loading logic. However, using a ready-made package has lots of advantages. Maybe you could explain why you cannot use --packages?

EDIT

Based on the chat, the only reason PO couldn't use --packages is his jar for mongodb is stored locally (and of course not in $PATH). In this case, providing --repositories /PATH/TO/JAR should fix the problem.

Upvotes: 1

Related Questions