Reputation: 12607
Lets say given a DataFrame
+-----+-----+-----+
| x| y| z|
+-----|-----+-----+
| 3| 5| 9|
| 2| 4| 6|
+-----+-----+-----+
I want to multiply all of the values in z
column by the value in y
column where z
column equals 6.
This post shows the solution I am aiming for, using the code
from pyspark.sql import functions as F
df = df.withColumn('z',
F.when(df['z']==6, df['z']*df['y']).
otherwise(df['z']))
The problem is that df['z']
and df['y']
are recognized as Column object and casting them won't work...
How can I do this correctly?
Upvotes: 1
Views: 509
Reputation: 35404
from pyspark.sql import functions as F
from pyspark.sql.types import LongType
df = df.withColumn('new_col',
F.when(df.z==6,
(df.z.cast(LongType()) * df.y.cast(LongType()))
).otherwise(df.z)
)
Upvotes: 1