Reputation: 25366
I am using the following code to export my data frame to csv:
data.write.format('com.databricks.spark.csv').options(delimiter="\t", codec="org.apache.hadoop.io.compress.GzipCodec").save('s3a://myBucket/myPath')
Note that I use delimiter="\t"
, as I don't want to add additional quotation marks around each field. However, when I checked the output csv file, there are still some fields which are enclosed by quotation marks. e.g.
abcdABCDAAbbcd ....
1234_3456ABCD ...
"-12345678AbCd" ...
It seems that the quotation mark appears when the leading character of a field is "-". Why is this happening and is there a way to avoid this? Thanks!
Upvotes: 1
Views: 674
Reputation: 330073
You don't use all the options provided by the CSV writer. It has quoteMode
parameter which takes one of the four values (descriptions from the org.apache.commons.csv
documentation:
ALL
- quotes all fieldsMINIMAL
(default) - quotes fields which contain special characters such as a delimiter, quotes character or any of the characters in line separatorNON_NUMERIC
- quotes all non-numeric fieldsNONE
- never quotes fieldsIf want to avoid quoting the last options looks a good choice, doesn't it?
Upvotes: 2