Reputation: 649
I have a pyspark dataframe, that contain one column of string.
df example:
number | id
---------------
12 | [12, .AZ, .UI]
------------------------
14 | [CL, .RT, OP.]
I want to remove the character '.'
I tried using regexp_replace
:
df = df.select("id", F.regexp_replace(F.col("id"), ".").alias("id"))
But I think regexp_replace is good solution for string not an array.
How can I remove this character from array ? Thank you
Upvotes: 1
Views: 831
Reputation: 75130
In Spark 2.4 or later you can use you can use a transform
import pyspark.sql.functions as F
df.withColumn("id",F.expr("transform(id,x-> replace(x,'.',''))")).show()
+------+------------+
|number| id|
+------+------------+
| 12|[12, AZ, UI]|
| 14|[CL, RT, OP]|
+------+------------+
Working example:
Upvotes: 2