Reputation: 69
i have a dataframe df=
name state
null CA
Julia null
Robert null
null NJ
where both name
and state
columns are string columns
I want to replace string 'null'
from name
column into None
.
When I tried using the replace
function as below, it is converting all values from name
column to None
. It is not the result I expect, I want only 'null'
values to be converted to None
.
df = df.withColumn('name', regexp_replace('name', 'null', None))
I am getting output as below, I guess it is not able to recognize 'null'
.
name state
null CA
null null
null null
null NJ
How to solve this issue?
Upvotes: 1
Views: 439
Reputation: 3348
This would work
df.withColumn("name", F.when((F.col('name') == "null"), None)).show()
Output with filter on String "null"
(as None
's string representation is also null
, can be misleading by just the output)
df.filter(F.col('name').eqNullSafe("null")).show()
df.withColumn("name", F.when((F.col('name') == "null"), None)).filter(F.col('name').eqNullSafe("null")).show()
Upvotes: 1