Edamame
Edamame

Reputation: 25366

Saved data has undesired quotation marks

I am using the following code to export my data frame to csv:

data.write.format('com.databricks.spark.csv').options(delimiter="\t", codec="org.apache.hadoop.io.compress.GzipCodec").save('s3a://myBucket/myPath')

Note that I use delimiter="\t", as I don't want to add additional quotation marks around each field. However, when I checked the output csv file, there are still some fields which are enclosed by quotation marks. e.g.

abcdABCDAAbbcd ....
1234_3456ABCD  ...
"-12345678AbCd"  ...

It seems that the quotation mark appears when the leading character of a field is "-". Why is this happening and is there a way to avoid this? Thanks!

Upvotes: 1

Views: 674

Answers (1)

zero323
zero323

Reputation: 330073

You don't use all the options provided by the CSV writer. It has quoteMode parameter which takes one of the four values (descriptions from the org.apache.commons.csv documentation:

  • ALL - quotes all fields
  • MINIMAL (default) - quotes fields which contain special characters such as a delimiter, quotes character or any of the characters in line separator
  • NON_NUMERIC - quotes all non-numeric fields
  • NONE - never quotes fields

If want to avoid quoting the last options looks a good choice, doesn't it?

Upvotes: 2

Related Questions