Viv
Viv

Reputation: 1584

Update column with a where clause in Pyspark

How to update a column in Pyspark dataframe with a where clause?

This is similar to this SQL operation :

   UPDATE table1 SET alpha1= x WHERE alpha2< 6;

where alpha1 and alpha2 are columns of the table1.

For Eg : I Have a dataframe table1 with values below :

table1

alpha1    alpha2
3         7
4         5
5         4
6         8 

dataframe Table1 after update : 

alpha1    alpha2
3         7
x         5
x         4
6         8

How to do this in pyspark dataframe?

Upvotes: 2

Views: 4482

Answers (1)

Assaf Mendelson
Assaf Mendelson

Reputation: 13001

You are looking for the when function:

df = spark.createDataFrame([("3",7),("4",5),("5",4),("6",8)],["alpha1", "alpha2"])
df.show()
>>> +------+------+
>>> |alpha1|alpha2|
>>> +------+------+
>>> |     3|     7|
>>> |     4|     5|
>>> |     5|     4|
>>> |     6|     8|
>>> +------+------+

df2 = df.withColumn("alpha1", pyspark.sql.functions.when(df["alpha2"] < 6, "x").otherwise(df["alpha1"]))
df2.show()
>>>+------+------+
>>>|alpha1|alpha2|
>>>+------+------+
>>>|     3|     7|
>>>|     x|     5|
>>>|     x|     4|
>>>|     6|     8|
>>>+------+------+

Upvotes: 7

Related Questions