a-better-world
a-better-world

Reputation: 43

How to pass configuration file that hosted in HDFS to Spark Application?

I'm working with Spark Structured Streaming. Also, I'm working with Scala. I want to pass config file to my spark application. This configuration file hosted in HDFS. For example;

spark_job.conf (HOCON)

spark {
  appName: "",
  master: "",
  shuffle.size: 4 
  etc..
}

kafkaSource {
  servers: "",
  topic: "",
  etc..
}

redisSink {
  host: "",
  port: 999,
  timeout: 2000,
  checkpointLocation: "hdfs location",
  etc..
}

How can I pass it to Spark Application? How can I read this file(hosted HDFS) in Spark?

Upvotes: 1

Views: 2761

Answers (1)

Yayati Sule
Yayati Sule

Reputation: 1631

You can read the HOCON config from HDFS in the following way:

import com.typesafe.config.{Config, ConfigFactory}
import java.io.InputStreamReader
import java.net.URI
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.conf.Configuration

val hdfs: FileSystem = FileSystem.get(new URI("hdfs://"), new Configuration())

val reader = new InputStreamReader(hdfs.open(new Path("/path/to/conf/on/hdfs")))

val conf: Config = ConfigFactory.parseReader(reader)

You can also pass the URI of your namenode to the FileSystem.get(new URI("your_uri_here")) and the code will still read your configuration.

Upvotes: 5

Related Questions