Reputation: 33
I'm trying to fill an empty value as null when I split a column in Spark. Example:
| A |
| 1.2.3 |
| 4..5 |
I was looking for:
A | A split 1 | A split 2 | A split 3 |
---|---|---|---|
1.2.3 | 1 | 2 | 3 |
4..5 | 4 | null | 5 |
I got:
A | A split 1 | A split 2 | A split 3 |
---|---|---|---|
1.2.3 | 1 | 2 | 3 |
4..5 | 4 | 5 |
My code is:
df.withColumn("A", when(split(col("A"), "\\.") =!= lit(""), split(col("A"), "\\."))
However, I got an error because due to a type mismatch:
array(string) is not a string.
It could be possible to find a solution without using a UDF?
Many thanks
Upvotes: 2
Views: 1297
Reputation: 32660
You can split then when getting array items as columns use when
to change to null if element is empty :
// n is the max array size from split (in your example it's 3)
val n = 3
val df1 = df.withColumn(
"ASplit",
split(col("A"), "[.]")
).select(
Seq(col("A")) ++ (0 to n-1).map(i =>
when(col("ASplit")(i) === "", lit(null)).otherwise(col("ASplit")(i)).as(s"A split $i")
): _*
)
//+-----+---------+---------+---------+
//| A|A split 0|A split 1|A split 2|
//+-----+---------+---------+---------+
//|1.2.3| 1| 2| 3|
//| 4..5| 4| null| 5|
//+-----+---------+---------+---------+
Upvotes: 1
Reputation: 42352
You can transform
the split result by replacing empty values with null:
val result = df.withColumn(
"split",
expr("transform(split(A, '\\\\.'), x -> case when x = '' then null else x end)")
).select($"A", $"split"(0), $"split"(1), $"split"(2))
result.show
+-----+--------+--------+--------+
| A|split[0]|split[1]|split[2]|
+-----+--------+--------+--------+
|1.2.3| 1| 2| 3|
| 4..5| 4| null| 5|
+-----+--------+--------+--------+
Upvotes: 2