jgtrz
jgtrz

Reputation: 375

pySpark withColumn with two conditions

I want to filter for two conditions: clean_reference.Output == " " and clean_reference.Primary == "DEFAULT". If both conditions apply, then clean_reference.Output else "NI"

The code below is not accepting my clean_reference.Outputas my when() value.

final_reference = clean_reference.withColumn("Output",f.when(clean_reference.Output == " ")| (clean_reference.Primary == "DEFAULT"), clean_reference.Output).otherwise("NI")
TypeError: when() missing 1 required positional argument: 'value'

Upvotes: 1

Views: 389

Answers (2)

murtihash
murtihash

Reputation: 8410

Put your cols like f.col() and value to assign as f.lit().

final_reference = clean_reference.withColumn("Output",\
                       f.when((f.col("Output") == " ")|                              
                             (f.col("Primary") ==\
                              "DEFAULT"), f.col("Output"))\
                                             .otherwise(f.lit("NI")))

Upvotes: 2

Ranga Vure
Ranga Vure

Reputation: 1932

same code, just fixed the braces.

final_reference = clean_reference.withColumn(
        "OutputItemNameByValue",
        f.when( 
          (clean_reference.OutputItemNameByValue == " ") | 
          (clean_reference.PrimaryLookupAttributeValue == "TRIANA_DEFAULT"),
          clean_reference.OutputItemNameByValue
        ).otherwise("Not Implemented")
)

Upvotes: 1

Related Questions