Reputation: 217
I would like to mutate my dataframe based on 2 boolean conditions with a bitwise AND operation
df %>% mutate(newVariable = ifelse(variable1 == "value1" & variable2 == "value2, variable3, NULL)
So in PySpark it tested this :
import pyspark.sql.functions as func
df.withColumn("newVariable", func.when( \
func.col("variable1") == "value1" & func.col("variable2") == "value2", \
func.col("variable3")))
But I have an error
What is the correct way to create this kind of new variable with spark dataframe ?
Upvotes: 1
Views: 6472
Reputation: 330283
You have to remember about operator precedence. In Python &
has a higher precedence than ==
so individual equality checks have to be parenthesized:
(func.col("variable1") == "value1") & (func.col("variable2") == "value2")
Otherwise expression is evaluated as:
(func.col("variable1") == ("value1" & func.col("variable2"))) == "value2"
Upvotes: 5