Ajay Kharade
Ajay Kharade

Reputation: 1525

How to pass external configuration file to pyspark(Spark 2.x) program?

When I am running pyspark program interactive shell able to fetch the configuration file(config.ini) inside pyspark script, But when I am trying to run same script using Spark submit command with master yarn and cluster deployment mode is cluster it giving me error as config file not exists, I have checked yarn log and able to see same, below is command for running the pyspark job.

spark2-submit --master yarn --deploy-mode cluster test.py /home/sys_user/ask/conf/config.ini

Upvotes: 0

Views: 3435

Answers (2)

Jugal Panchal
Jugal Panchal

Reputation: 1548

Pass the ini file in spark.files parameter

.config('spark.files', 'config/local/config.ini') \

Read in pyspark:

with open(SparkFiles.get('config.ini')) as config_file:
    print(config_file.read())

It works for me.

Upvotes: 0

Ajay Kharade
Ajay Kharade

Reputation: 1525

With spark2-sumbmit command there is parameter provided properties-file, you can use that to get this properties file available in spark-submit command.

e.g. spark2-submit --master yarn --deploy-mode cluster --properties-file $CONF_FILE_NAME pyspark_script.py

Upvotes: 1

Related Questions