Capacytron
Capacytron

Reputation: 3729

Spark CSV writer outputs double quotes for empty string

I've wrote UDF for Scala Spark

import org.apache.spark.sql.functions.{col, udf}
def mapToString: Map[String, Double] => String = /** // returns k1:v1,k2:v2 or empty string if map is empty */

val mapToStringUDF = udf(mapToString)

// Then I try to save my Dataset as csv
      myDataset
      .withColumn("map_str", mapToStringUDF(col("map")))
      .drop("map")
      .write
      .option("header", false)
      .option("delimiter", "\t")
      .csv("output.csv")

it outputs "" if mapToStringUDF returns empty string. I want to get nothing in output if mapToStringUDF returned empty string.

What is the right way to do it?

Upvotes: 2

Views: 3975

Answers (1)

tpmiller85
tpmiller85

Reputation: 86

The Spark DataFrameWriter has two parameters for the .csv format option that you can set: nullValue and emptyValue, which you can both set to be null instead of empty strings. See the DataFrameWriter documentation here.

In your specific example you can just add the options to your write statement:

myDataset
  .withColumn("map_str", mapToStringUDF(col("map")))
  .drop("map")
  .write
  .option("emptyValue", null)
  .option("nullValue", null)
  .option("header", "false")
  .option("delimiter", "\t")
  .csv("output.csv")

Or here's a full example, including test data:

import org.apache.spark.sql.Row
import org.apache.spark.sql.types._

val data = Seq(
  Row(null, "20200506", "Hello"),
  Row(2, "20200607", null),
  Row(3, null, "World")
  )

val schema = List(
  StructField("Item", IntegerType, true),
  StructField("Date", StringType, true),
  StructField("Message", StringType, true)
  )

val testDF = spark.createDataFrame(
  spark.sparkContext.parallelize(data),
  StructType(schema)
  )

testDF.write
  .option("emptyValue", null)
  .option("nullValue", null)
  .option("header", "true")
  .csv(PATH)

The resulting raw .csv should look like this:

Item,Date,Message
,20151231,Hello
2,20160101,
3,,World

Upvotes: 7

Related Questions