Reputation: 9159
Trying to save a csv file from spark but when I open up the file in vim I get the following characters and I don't understand what they mean or how they are getting into the file.
Here is my writer from spark:
df.write.partitionBy(partitionCol).format("csv").mode(writeMode)
.option("sep", ",")
.option("encoding", "UTF-8")
.option("quote", "")
.option("escape", "\\")
.option("escapeQuotes", true)
.option("quoteAll", true)
.option("header", hasHeader)
.option("nullValue", "")
.option("dateFormat", "yyyy-MM-dd")
.option("timestampFormat", "yyyy-MM-dd'T'HH:mm:ss.SSSZZ")
.option("compression", "gzip")
.save(outPath)
Upvotes: 0
Views: 1487
Reputation: 38238
From the documentation:
"quote – sets a single character used for escaping quoted values where the separator can be part of the value. If None is set, it uses the default value, ". If an empty string is set, it uses u0000 (null character)."
So, as you're setting quote
to be an empty string, you're getting null bytes as quotes (traditionally represented as "^@" in caret notation.)
Upvotes: 4