Reputation: 2231
The below code does not add the double quotes which is the default. I also tried adding # and single quote using option quote
with no success. I also used quoteMode
with ALL
and NON_NUMERIC
options, still no change in the output.
s2d.coalesce(64).write
.format("com.databricks.spark.csv")
.option("header", "false")
.save(fname)
Are there any other options I can try? I am using spark-csv 2.11 over spark 2.1.
Output it produces:
d4c354ef,2017-03-14 16:31:33,2017-03-14 16:31:46,104617772177,340618697
Output I am looking for:
“d4c354ef”,”2017-03-14 16:31:33”,”2017-03-14 16:31:46”,104617772177,340618697
Upvotes: 6
Views: 14868
Reputation: 74619
tl;dr Enable quoteAll
option.
scala> Seq(("hello", 5)).toDF.write.option("quoteAll", true).csv("hello5.csv")
The above gives the following output:
$ cat hello5.csv/part-00000-a0ecb4c2-76a9-4e08-9c54-6a7922376fe6-c000.csv
"hello","5"
That assumes the quote
is "
(see CSVOptions)
That however won't give you "Double quotes around all non-numeric characters." Sorry.
You can see all the options in CSVOptions that serves as the source of the options for the CSV reader and writer.
p.s. com.databricks.spark.csv
is currently a mere alias for csv
format. You can use both interchangeably, but the shorter csv
is preferred.
p.s. Use option("header", false)
(false
as boolean not String) that will make your code slightly more type-safe.
Upvotes: 6
Reputation: 30300
In Spark 2.1 where the old CSV library has been inlined, I do not see any option for what you want in the csv
method of DataFrameWriter
as seen here.
So I guess you have to map over your data "manually" to determine which of the Row
components are non-numbers and quote them accordingly. You could utilize a straightforward isNumeric
helper function like this:
def isNumeric(s: String) = s.nonEmpty && s.forall(Character.isDigit)
As you map over your DataSet
, quote the values where isNumeric
is false
.
Upvotes: 2