Juhan
Juhan

Reputation: 1291

Spark dataframe not writing Double quotes into csv file properly

I'm reading a file delimited by pipe(|). There are fields having double quotes makes issue while reading and writing the data into another file. The input file is given below.

123|"ABC"|hello
124|"AB|hello all
125|A"B"|hellll

The code is given below.

val myDf = session.sqlContext.read.format("csv")
      .option("charset", "UTF8")
      .option("inferSchema", "true")
      .option("quote","\u0000")
      .schema(mySchema)
      .option("delimiter", "|")
      .option("nullValue", "")
      .option("treatEmptyValuesAsNulls", "true")
      .load("path to file")

When i do myDf.show() shows the output correctly in Console. But when i write the same dataframe to CSV file, All double quotes are replaced by \".

myDf.repartition(1).write
      .format("com.databricks.spark.csv")
      .option("delimiter", "|")
      .save("Path to save file")

Output in the csv file:

123|"\"ABC\""|hello
124|"\"AB"|hello all
125|"A\"B\""|hellll

Why this happens so, Is there any way to get the csv as expected below.

123|"ABC"|hello
124|"AB|hello all
125|A"B"|hellll

Upvotes: 4

Views: 15113

Answers (1)

ollik1
ollik1

Reputation: 4540

It can be done by disabling both escaping and quotation

myDf.repartition(1).write
      .format("com.databricks.spark.csv")
      .option("escape", "")
      .option("quote", "")
      .option("delimiter", "|")
      .save("Path to save file")

Upvotes: 11

Related Questions