Moein
Moein

Reputation: 1730

Removing NULL items from PySpark arrays

How to remove the null items from array(1, 2, null, 3, null)? Using the array_remove function doesn't help when we want to remove null items.

Upvotes: 3

Views: 2128

Answers (3)

jornathan
jornathan

Reputation: 856

There is already accepted answer and I leave answer for a person who is working with java.

It could be done with array_compact org.apache.spark.sql.functions.array_compact But this is provided from spark 3.4.0.

And i took it from comment, thanks @HarlanNelson

# I have a text column; col(values) = "1.1,2,,,,,3.5, 4.1"
.withColumn("values_array", filter(split(col("values"), ",").cast("array<float>"), x -> x.isNotNull()))

# [1.1, 2, 3.5, 4.1]

Upvotes: 0

ZygD
ZygD

Reputation: 24386

Spark 3.4+

F.array_compact("col_name")

array_compact does not remove duplicates.


Full example:

from pyspark.sql import functions as F
df = spark.createDataFrame([([1, 2, None, 3, None],)], ["c"])
df.show(truncate=0)
# +---------------------+
# |c                    |
# +---------------------+
# |[1, 2, null, 3, null]|
# +---------------------+

df = df.withColumn("c", F.array_compact("c"))

df.show()
# +---------+
# |        c|
# +---------+
# |[1, 2, 3]|
# +---------+

Upvotes: 1

Moein
Moein

Reputation: 1730

I used the following trick, using array_except() function:

SELECT array_except(array(1, 2, null, 3, null), array(null)) returns [1,2,3]

Upvotes: 3

Related Questions