Amber
Amber

Reputation: 944

Pass delimiter to Spark as an argument

I'm trying to pass a value to my Spark program which would be used as a delimiter to read a .dat file. My code looks something like this

val delim = args(0)
val df = spark.read.format("csv").option("delimiter", delim).load("/path/to/file/")

And I run the program as following command -

spark2-submit --class a.b.c.MyClass My.jar \\u0001

But I get an error saying that multiple characters can't be used as delimiter. But when I directly use the String instead of getting it as a variable, the code works fine

val df = spark.read.format("csv").option("delimiter", "\u0001").load("/path/to/file/")

Can someone help me with this?

Upvotes: 2

Views: 571

Answers (1)

Kombajn zbożowy
Kombajn zbożowy

Reputation: 10703

String "\u0001" is a unicode character, but what is passed to spark from the command line is a literal string "\\u0001". You need to explicitly unescape Unicode:

val df = spark.read.format("csv").option("delimiter", unescapeUnicode(delim)).load("/path/to/file/")

Find unescapeUnicode function in this answer.

Upvotes: 3

Related Questions