How to replace a string in a column with other string from the same column

Question

I have below dataframe.

id,code

1,GSTR

2,GSTR

3,NA

4,NA

5,NA

here GSTR may change it can be anything. i want to replace NA with other string that is present in the same column.

In this case i want to replace NA with other string that is present in the column i.e GSTR. I tried to use UDFS but being an unknown string. I am not able to figure out.

Note: In this code column there will be only two strings. one will be "NA" and another can be anything in our case GSTR is another string

Expected output

1,GSTR

2,GSTR

3,GSTR

4,GSTR

5,GSTR

Suresh · Accepted Answer

we can take the distinct string other than NA and use it,

>>> from pyspark.sql import functions as F
>>> df = spark.createDataFrame([(1,'GSTR'),(2,'GSTR'),(3,'NA'),(4,'NA'),(5,'NA')],['id','code'])
>>> df.show()
+---+----+
| id|code|
+---+----+
|  1|GSTR|
|  2|GSTR|
|  3|  NA|
|  4|  NA|
|  5|  NA|
+---+----+
>>> rstr = df.where(df.code != 'NA')[['code']].first().code
>>> df.withColumn('code',F.lit(rstr)).show()
+---+----+
| id|code|
+---+----+
|  1|GSTR|
|  2|GSTR|
|  3|GSTR|
|  4|GSTR|
|  5|GSTR|
+---+----+

Hope this helps.

How to replace a string in a column with other string from the same column

Answers (1)

Related Questions