verojoucla
verojoucla

Reputation: 649

Remove special character from array pyspark

I have a pyspark dataframe, that contain one column of string.

df example:

number | id
---------------
12     | [12, .AZ, .UI]
------------------------
14     | [CL, .RT, OP.]

I want to remove the character '.'

I tried using regexp_replace:

df = df.select("id", F.regexp_replace(F.col("id"), ".").alias("id"))

But I think regexp_replace is good solution for string not an array.

How can I remove this character from array ? Thank you

Upvotes: 1

Views: 831

Answers (1)

anky
anky

Reputation: 75130

In Spark 2.4 or later you can use you can use a transform

import pyspark.sql.functions as F
df.withColumn("id",F.expr("transform(id,x-> replace(x,'.',''))")).show()

+------+------------+
|number|          id|
+------+------------+
|    12|[12, AZ, UI]|
|    14|[CL, RT, OP]|
+------+------------+

Working example:

enter image description here

Upvotes: 2

Related Questions