Sushant Gupta
Sushant Gupta

Reputation: 1527

How can I add configuration files to a Spark job running in YARN-CLUSTER mode?

I am using spark 1.6.0. I want to upload a files using --files tag and read the file content after initializing the spark context.

My spark-submit command syntax looks like below:

spark-submit \
  --deploy-mode yarn-cluster \
  --files /home/user/test.csv \
  /home/user/spark-test-0.1-SNAPSHOT.jar

I read the Spark documentation and it suggested me to use SparkFiles.get("test.csv") but this is not working in yarn-cluster mode. If I change the deploy mode to local, the code works fine but I get a file not found exception in yarn-cluster mode.

I can see in logs that my files is uploaded to hdfs://host:port/user/guest/.sparkStaging/application_1452310382039_0019/test.csv directory and the SparkFiles.get is trying to look for file in /tmp/test.csv which is not correct. If someone has successfully used this, please help me solve this.

Upvotes: 1

Views: 553

Answers (1)

Kishore
Kishore

Reputation: 5881

Spark submit command

spark-submit \
  --deploy-mode yarn-client \
  --files /home/user/test.csv \
  /home/user/spark-test-0.1-SNAPSHOT.jar /home/user/test.csv

Read file in main program

def main(args: Array[String]) {
    val fis = new FileInputStream(args(0));
    // read content of file
}

Upvotes: 1

Related Questions