dreddy
dreddy

Reputation: 463

spark df.write quote all fields but not null values

I am trying to create a csv from values stored in the table:

 | col1   | col2   | col3  |
 | "one"  | null   | "one" |
 | "two"  | "two"  | "two" |

hive > select * from table where col2 is null;
 one   null    one 

I am getting the csv using the below code:

df.repartition(1)
  .write.option("header",true)
  .option("delimiter", ",")
  .option("quoteAll", true)
  .option("nullValue", "")
  .csv(S3Destination)

Csv I get:

"col1","col2","col3"
"one","","one"
"two","two","two"

Expected Csv:WITH NO DOUBLE QUOTES FOR NULL VALUE

"col1","col2","col3"
"one",,"one"
"two","two","two"

Any help is appreciated to know if the dataframe writer has options to do this.

Upvotes: 5

Views: 2375

Answers (1)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29165

You can go in a udf approach and apply on the column (using withColumn on the repartitioned datafrmae above) where possiblity of double quote empty string is there see below sample code

 sqlContext.udf().register("convertToEmptyWithOutQuotes",(String abc) -> (abc.trim().length() > 0 ? abc : abc.replace("\"", " ")),DataTypes.StringType);

String has replace method which does the job.

val a =  Array("'x'","","z")
println(a.mkString(",").replace("\"", " "))

will produce 'x',,z

Upvotes: 1

Related Questions