Reputation: 1730
How to remove the null items from array(1, 2, null, 3, null)
?
Using the array_remove
function doesn't help when we want to remove null
items.
Upvotes: 3
Views: 2128
Reputation: 856
There is already accepted answer and I leave answer for a person who is working with java.
It could be done with array_compact
org.apache.spark.sql.functions.array_compact
But this is provided from spark 3.4.0.
And i took it from comment, thanks @HarlanNelson
# I have a text column; col(values) = "1.1,2,,,,,3.5, 4.1"
.withColumn("values_array", filter(split(col("values"), ",").cast("array<float>"), x -> x.isNotNull()))
# [1.1, 2, 3.5, 4.1]
Upvotes: 0
Reputation: 24386
Spark 3.4+
F.array_compact("col_name")
array_compact
does not remove duplicates.
Full example:
from pyspark.sql import functions as F
df = spark.createDataFrame([([1, 2, None, 3, None],)], ["c"])
df.show(truncate=0)
# +---------------------+
# |c |
# +---------------------+
# |[1, 2, null, 3, null]|
# +---------------------+
df = df.withColumn("c", F.array_compact("c"))
df.show()
# +---------+
# | c|
# +---------+
# |[1, 2, 3]|
# +---------+
Upvotes: 1
Reputation: 1730
I used the following trick, using array_except()
function:
SELECT array_except(array(1, 2, null, 3, null), array(null))
returns [1,2,3]
Upvotes: 3