Stéphane Soulier
Stéphane Soulier

Reputation: 217

spark-dataframe: Create new column with 2 boolean conditions

I would like to mutate my dataframe based on 2 boolean conditions with a bitwise AND operation

df %>% mutate(newVariable = ifelse(variable1 == "value1" & variable2 == "value2, variable3, NULL)

So in PySpark it tested this :

import pyspark.sql.functions as func

df.withColumn("newVariable", func.when( \
     func.col("variable1") == "value1" & func.col("variable2") == "value2", \
     func.col("variable3")))

But I have an error

What is the correct way to create this kind of new variable with spark dataframe ?

Upvotes: 1

Views: 6472

Answers (1)

zero323
zero323

Reputation: 330283

You have to remember about operator precedence. In Python & has a higher precedence than == so individual equality checks have to be parenthesized:

(func.col("variable1") == "value1") & (func.col("variable2") == "value2")

Otherwise expression is evaluated as:

(func.col("variable1") == ("value1" & func.col("variable2"))) == "value2"

Upvotes: 5

Related Questions