Exclude null values in column while using windows partition by column in Pyspark

Question

I have a data frame like below in pyspark.

+-----+---+---+----+
|tests|val|asd|cnty|
+-----+---+---+----+
|test1|  Y|  1|null|
|test2|  N|  2|  UK|
| null|  Y|  1|  UK|
|test1|  N|  2|null|
| null|  N|  3|null|
|test3|  N|  4| AUS|
|test4|  Y|  5|null|
+-----+---+---+----+

I want to update the val column when the any given tests has val Y then all val's of that particular tests should be updated to Y. if not then what ever values they have. I want to exclude the records where tests column has null values.

I have done like below

from pyspark.sql import Window
import pyspark.sql.functions as f


df1 = df.select('tests', f.max('val').over(Window.partitionBy('tests')).alias('val'), 'asd', 'cnty')

I am getting result like below

+-----+---+---+----+
|tests|val|asd|cnty|
+-----+---+---+----+
|test1|  Y|  1|null|
|test1|  Y|  2|null|
|test2|  N|  2|  UK|
|test3|  N|  4| AUS|
|test4|  Y|  5|null|
| null|  Y|  1|  UK|
| null|  Y|  3|null|
+-----+---+---+----+

I want the result to be like below

+-----+---+---+----+
|tests|val|asd|cnty|
+-----+---+---+----+
|test1|  Y|  1|null|
|test1|  Y|  2|null|
|test2|  N|  2|  UK|
|test3|  N|  4| AUS|
|test4|  Y|  5|null|
| null|  Y|  1|  UK|
| null|  N|  3|null|
+-----+---+---+----+

Ramesh Maharjan · Accepted Answer

All you needed was an additional when condition checking

from pyspark.sql import Window
import pyspark.sql.functions as f

df1 = df.select('tests', f.when((f.isnull(f.col('tests'))) | (f.col('tests') == 'null'), f.col('val')).otherwise(f.max('val').over(Window.partitionBy('tests'))).alias('val'), 'asd', 'cnty')

which would give you

+-----+---+---+----+
|tests|val|asd|cnty|
+-----+---+---+----+
|test4|  Y|  5|null|
|test3|  N|  4| AUS|
|test1|  Y|  1|null|
|test1|  Y|  2|null|
|test2|  N|  2|  UK|
| null|  Y|  1|  UK|
| null|  N|  3|null|
+-----+---+---+----+

I hope the answer is helpful

Exclude null values in column while using windows partition by column in Pyspark

Answers (1)

Related Questions