Reputation: 33
Hi I'm starting to use Pyspark and want to put a when and otherwise condition in:
df_1 = df.withColumn("test", when(df.first_name == df2.firstname & df.last_namne == df2.lastname, "1. Match on First and Last Name").otherwise ("No Match"))
I get the below error and wanted some assistance to understand why the above is not working.
Both df.first_name and df.last_name are strings and also df2.firstname and df2.lastname strings too
Error: ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
Thanks in advance
Upvotes: 0
Views: 131
Reputation: 4108
There are several issues in your statement:
df.withColum()
, you can not use df and df2 columns in one statement. First join the two dataframes using df.join(df2, on="some_key", how="left/right/full")
.(df.first_name == df2.firstname) & (df.last_name == df2.lastname)
lit()
like: lit("1. Match on First and Last Name")
and lit("No Match")
.df.last_namne
.Upvotes: 1