user13730368
user13730368

Reputation:

How to load external resource files in Spark application?

Hello I am trying to use some resource file which is a csv file, which I load in the spark application when I run it using

   val resSource: BufferedSource = Source.fromResource("res.csv")

This work when I am using IDE and the csv is in resource folder.

But when I am trying to use assembly to create a fat JAR, This this code throws null pointer exception.

What is the best way to read external files when the JAR is deployed using spark-submit ?

I tried already answered solutions like getClass.getResources etc , these solutions dont work. Can someone help me how to do it ? I want to know how to use the --files options and how to access it in the application ?

Upvotes: 0

Views: 633

Answers (1)

Mansoor Baba Shaik
Mansoor Baba Shaik

Reputation: 492

$ spark-shell --files data.txt


scala> import java.io.File
import java.io.File

scala> import org.apache.spark.SparkFiles
import org.apache.spark.SparkFiles

scala> val rootDir = new File(SparkFiles.getRootDirectory())
rootDir: java.io.File = /private/var/folders/lr/shv51xn15zqdtzb67c7f4ydc0000gn/T/spark-ce4fd20e-ebd2-4c05-b6a7-39c338332dd3/userFiles-3c78071d-75bb-47e1-a3e0-0be2b83c1600

scala> rootDir.listFiles.foreach(x => println(x.getName))
data.txt

scala> val dataFile = SparkFiles.get("data.txt")
dataFile: String = /private/var/folders/lr/shv51xn15zqdtzb67c7f4ydc0000gn/T/spark-ce4fd20e-ebd2-4c05-b6a7-39c338332dd3/userFiles-3c78071d-75bb-47e1-a3e0-0be2b83c1600/data.txt

scala> spark.read.text(dataFile).show()
+---------------+
|          value|
+---------------+
|'$1,200.00',abc|
|'$1,201.00',und|
|'$1,202.00',jsn|
|'$1,203.00',yhs|
|'$1,204.00',rfs|
|'$1,205.00',jsn|
|'$1,202.00',han|
+---------------+

https://spark.apache.org/docs/2.4.5/api/scala/#org.apache.spark.SparkFiles$

Upvotes: 1

Related Questions