Reputation: 1291
I'm reading a file delimited by pipe(|). There are fields having double quotes makes issue while reading and writing the data into another file. The input file is given below.
123|"ABC"|hello
124|"AB|hello all
125|A"B"|hellll
The code is given below.
val myDf = session.sqlContext.read.format("csv")
.option("charset", "UTF8")
.option("inferSchema", "true")
.option("quote","\u0000")
.schema(mySchema)
.option("delimiter", "|")
.option("nullValue", "")
.option("treatEmptyValuesAsNulls", "true")
.load("path to file")
When i do myDf.show() shows the output correctly in Console.
But when i write the same dataframe to CSV file, All double quotes are replaced by \"
.
myDf.repartition(1).write
.format("com.databricks.spark.csv")
.option("delimiter", "|")
.save("Path to save file")
Output in the csv file:
123|"\"ABC\""|hello
124|"\"AB"|hello all
125|"A\"B\""|hellll
Why this happens so, Is there any way to get the csv as expected below.
123|"ABC"|hello
124|"AB|hello all
125|A"B"|hellll
Upvotes: 4
Views: 15113
Reputation: 4540
It can be done by disabling both escaping and quotation
myDf.repartition(1).write
.format("com.databricks.spark.csv")
.option("escape", "")
.option("quote", "")
.option("delimiter", "|")
.save("Path to save file")
Upvotes: 11