Reputation: 3729
I've wrote UDF for Scala Spark
import org.apache.spark.sql.functions.{col, udf}
def mapToString: Map[String, Double] => String = /** // returns k1:v1,k2:v2 or empty string if map is empty */
val mapToStringUDF = udf(mapToString)
// Then I try to save my Dataset as csv
myDataset
.withColumn("map_str", mapToStringUDF(col("map")))
.drop("map")
.write
.option("header", false)
.option("delimiter", "\t")
.csv("output.csv")
it outputs ""
if mapToStringUDF
returns empty string. I want to get nothing in output if mapToStringUDF
returned empty string.
What is the right way to do it?
Upvotes: 2
Views: 3975
Reputation: 86
The Spark DataFrameWriter has two parameters for the .csv
format option that you can set: nullValue
and emptyValue
, which you can both set to be null
instead of empty strings. See the DataFrameWriter documentation here.
In your specific example you can just add the options to your write
statement:
myDataset
.withColumn("map_str", mapToStringUDF(col("map")))
.drop("map")
.write
.option("emptyValue", null)
.option("nullValue", null)
.option("header", "false")
.option("delimiter", "\t")
.csv("output.csv")
Or here's a full example, including test data:
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
val data = Seq(
Row(null, "20200506", "Hello"),
Row(2, "20200607", null),
Row(3, null, "World")
)
val schema = List(
StructField("Item", IntegerType, true),
StructField("Date", StringType, true),
StructField("Message", StringType, true)
)
val testDF = spark.createDataFrame(
spark.sparkContext.parallelize(data),
StructType(schema)
)
testDF.write
.option("emptyValue", null)
.option("nullValue", null)
.option("header", "true")
.csv(PATH)
The resulting raw .csv
should look like this:
Item,Date,Message
,20151231,Hello
2,20160101,
3,,World
Upvotes: 7