user8510536
user8510536

Reputation:

How to replace a string in a column with other string from the same column

I have below dataframe.

id,code

1,GSTR

2,GSTR

3,NA

4,NA

5,NA

here GSTR may change it can be anything. i want to replace NA with other string that is present in the same column. 

In this case i want to replace NA with other string that is present in the column i.e GSTR. I tried to use UDFS but being an unknown string. I am not able to figure out.

Note: In this code column there will be only two strings. one will be "NA" and another can be anything in our case GSTR is another string

Expected output

1,GSTR

2,GSTR

3,GSTR

4,GSTR

5,GSTR

Upvotes: 0

Views: 711

Answers (1)

Suresh
Suresh

Reputation: 5870

we can take the distinct string other than NA and use it,

>>> from pyspark.sql import functions as F
>>> df = spark.createDataFrame([(1,'GSTR'),(2,'GSTR'),(3,'NA'),(4,'NA'),(5,'NA')],['id','code'])
>>> df.show()
+---+----+
| id|code|
+---+----+
|  1|GSTR|
|  2|GSTR|
|  3|  NA|
|  4|  NA|
|  5|  NA|
+---+----+
>>> rstr = df.where(df.code != 'NA')[['code']].first().code
>>> df.withColumn('code',F.lit(rstr)).show()
+---+----+
| id|code|
+---+----+
|  1|GSTR|
|  2|GSTR|
|  3|GSTR|
|  4|GSTR|
|  5|GSTR|
+---+----+

Hope this helps.

Upvotes: 1

Related Questions