Reputation: 3
I want to write the dataset in .csv file using spark java and the data of dataset should be in double quotes which are not null and also the headings in file which are actually field names of model class should not contain double quotes and values in file are separted by pipe delimiter(|)
example -> data should be appeared to file in below given format
NAME|ID|MOBILE_NUMBER|DEGREE|SALUTATION|LASTNAME|MAIL|DATE_OF_BIRTH "SMITH"|"56354"|"46535353"||"MR LOG"|"LOG"||"2013-10-24T11:04:52+00:00"
1st line is Heading which are field names of class and 2nd line is data
Here data of DEGREE and MAIL fields are null so they should not be double quoted also headings should not be double quoted
I tried this but not working in case of null values and headings for double quotes
dataset.toDF().selectExpr(NAME,ID,MOBILE,DEGREE,SALUTATION,LASTNAME,MAIL,DATE_OF_BIRTH)
.write().format("csv")
.option("delimiter", "|"),
.option("quotesAll",true),
.option("header", true)
.option("nullValue", null)
.mode(SaveMode.Overwrite)
.save(path);
Upvotes: 0
Views: 833
Reputation: 473
Option quoteAll
will quote everything (including header). Class CSVOptions
has no such option for custom formatting.
You can write a custom expression to quote not-null values and then save it.
Scala:
val quote = lit("\"")
//quote value if not null otherwise return.
def quoteColumn(c: Column) = when(c.isNotNull, concat(quote, c.cast(StringType), quote )) otherwise c.cast(StringType)
//call quoteColumn to add quote in all column values in the DataFrame
val resDF = df.select(df.columns.map(a => quoteColumn(col(a)).as(a)):_*)
resDF.write()
.format("csv")
.option("delimiter", "|")
.option("header", true)
//set quote to empty, no need to set null
.option("quote", "")
.mode(SaveMode.Overwrite)
.save(path);
Java:
private Column quoteColumn(Column c) {
Column quote = lit("\"");
return when(c.isNotNull(), concat(quote, c.cast(StringType), quote)).otherwise(c.cast(StringType));
}
Dataset<Row> resDF = df.select(Arrays.stream(df.columns()).map(a -> quoteColumn(col(a)).as(a)).toArray(Column[]::new));
resDF.write()
.format("csv")
.option("delimiter", "|")
.option("header", true)
// set quote to empty
.option("quote", "")
// set null for null values
.option("nullValue", null)
.mode(SaveMode.Overwrite)
.save(path);
Upvotes: 1