karthik
karthik

Reputation: 69

pyspark dataframe with regexp_replace function

i have a dataframe df=

name  state
null   CA
Julia  null
Robert null
null   NJ

where both name and state columns are string columns

I want to replace string 'null' from name column into None.

When I tried using the replace function as below, it is converting all values from name column to None. It is not the result I expect, I want only 'null' values to be converted to None.

df = df.withColumn('name', regexp_replace('name', 'null', None))

I am getting output as below, I guess it is not able to recognize 'null'.

name  state
null   CA
null   null
null   null
null   NJ

How to solve this issue?

Upvotes: 1

Views: 439

Answers (1)

Ronak Jain
Ronak Jain

Reputation: 3348

This would work

df.withColumn("name", F.when((F.col('name') == "null"), None)).show()

Output with filter on String "null" (as None's string representation is also null, can be misleading by just the output)

df.filter(F.col('name').eqNullSafe("null")).show()
df.withColumn("name", F.when((F.col('name') == "null"), None)).filter(F.col('name').eqNullSafe("null")).show()

Output

Upvotes: 1

Related Questions