BdEngineer
BdEngineer

Reputation: 3199

How to pass external resouce yml /property file while running spark job on cluster?

I am using spark-sql 2.4.1 version, jackson jars & Java 8.

In my spark program/job I am reading few configurations/properties from external "conditions.yml" file which is place in "resource" folder of my Java Project as below

ObjectMapper mapper = new ObjectMapper(new YAMLFactory());
        try {
            driverConfig = mapper.readValue(
                    Configuration.class.getClassLoader().getResourceAsStream("conditions.yml"),Configuration.class);

        }

If I want to pass "conditions.yml" file from outside while submitting spark-job how to pass this file ? where it should be placed?

In my program I am reading from "resouces" directory i.e. .getResourceAsStream("conditions.yml") ...if i pass this property file from spark-submit ...will the job takes from here from resouces or external path ?

If I want to pass as external file , do I need to change the code above ?

Updated Question:

In my spark driver program I am reading the property file as program arguments Which is being loaded as below

 Config props = ConfigFactory.parseFile(new File(args[0]));

While running my spark job in shell script I am giving as below

$SPARK_HOME/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--name MyDriver  \
--jars "/local/jars/*.jar" \
--files hdfs://files/application-cloud-dev.properties,hdfs://files/condition.yml \
--class com.sp.MyDriver \
--executor-cores 3 \
--executor-memory 9g \
--num-executors 5 \
--driver-cores 2 \
--driver-memory 4g \
--driver-java-options -Dconfig.file=./application-cloud-dev.properties \
--conf spark.executor.extraJavaOptions=-Dconfig.file=./application-cloud-dev.properties \
--conf spark.driver.extraClassPath=. \
--driver-class-path . \
 ca-datamigration-0.0.1.jar application-cloud-dev.properties condition.yml

Error :

Not loading the properties... what is wrong here ? What is the correct way to pass the Program Args to Spark-Job Java program?

Upvotes: 1

Views: 2320

Answers (2)

Thuy_Bui96
Thuy_Bui96

Reputation: 1

You can add: spec: args:

  • --deploy-mode
  • cluster

Upvotes: 0

Aaron
Aaron

Reputation: 686

you will have to use --file path to your file in spark-submit command to be able to pass any files. please note this is

syntax for that is

 "--file /home/user/config/my-file.yml" 

if it is on hdfs then provide the hdfs path

this should copy the file to class path and your code should be able find it from the driver.

the implementation of reading the file should be done with something like this

def readProperties(propertiesPath: String) = {

val url = getClass.getResource("/" + propertiesPath)
assert(url != null, s"Could not create URL to read $propertiesPath properties file")
val source = Source.fromURL(url)
val properties = new Properties
properties.load(source.bufferedReader)

properties
}

hope that is what you are looking for

Upvotes: 2

Related Questions