horatio1701d
horatio1701d

Reputation: 9159

Unwanted Characters in CSV

Trying to save a csv file from spark but when I open up the file in vim I get the following characters and I don't understand what they mean or how they are getting into the file. enter image description here

Here is my writer from spark:

df.write.partitionBy(partitionCol).format("csv").mode(writeMode)
  .option("sep", ",")
  .option("encoding", "UTF-8")
  .option("quote", "")
  .option("escape", "\\")
  .option("escapeQuotes", true)
  .option("quoteAll", true)
  .option("header", hasHeader)
  .option("nullValue", "")
  .option("dateFormat", "yyyy-MM-dd")
  .option("timestampFormat", "yyyy-MM-dd'T'HH:mm:ss.SSSZZ")
  .option("compression", "gzip")
  .save(outPath)

Upvotes: 0

Views: 1487

Answers (1)

Matt Gibson
Matt Gibson

Reputation: 38238

From the documentation:

"quote – sets a single character used for escaping quoted values where the separator can be part of the value. If None is set, it uses the default value, ". If an empty string is set, it uses u0000 (null character)."

So, as you're setting quote to be an empty string, you're getting null bytes as quotes (traditionally represented as "^@" in caret notation.)

Upvotes: 4

Related Questions