bluesummers
bluesummers

Reputation: 12607

Updating a column in pyspark dependent on the column current value

Lets say given a DataFrame

+-----+-----+-----+
|    x|    y|    z|
+-----|-----+-----+
|    3|    5|    9|
|    2|    4|    6|
+-----+-----+-----+

I want to multiply all of the values in z column by the value in y column where z column equals 6.

This post shows the solution I am aiming for, using the code

from pyspark.sql import functions as F

df = df.withColumn('z',
    F.when(df['z']==6, df['z']*df['y']).
    otherwise(df['z']))

The problem is that df['z'] and df['y'] are recognized as Column object and casting them won't work...

How can I do this correctly?

Upvotes: 1

Views: 509

Answers (1)

mrsrinivas
mrsrinivas

Reputation: 35404

from pyspark.sql import functions as F
from pyspark.sql.types import LongType

df = df.withColumn('new_col', 
            F.when(df.z==6, 
                (df.z.cast(LongType()) * df.y.cast(LongType()))
            ).otherwise(df.z)
     )

Upvotes: 1

Related Questions