Reputation: 21
I am trying to read utf-8 encoding file into Spark Scala. I am doing this
val nodes = sparkContext.textFile("nodes.csv")
where the given csv file is in UTF-8, but spark converts non-english characters to ?
How do I get it to read actual values? I tried it in pyspark and it works fine because pyspark's textFile()
function has encoding option and by default support utf-8 (it seems).
I am sure the file is in utf-8 encoding. I did this to confirm
➜ workspace git:(f/playground) ✗ file -I nodes.csv
nodes.csv: text/plain; charset=utf-8
Upvotes: 0
Views: 2885