a1letterword
a1letterword

Reputation: 307

How do I use Boolean logic within a pyspark dataframe for sets

I'm trying to create a new column in a pyspark dataframe that is predicated on the contents of another column. The other column has all integers, and I want the new column to be encoded with either 1's or 0's.

import pyspark.sql.functions as F
df2 = df2.withColumn('Industrial', F.when(F.col('CODE') in (1,2,3,4), 1).otherwise(0))

This doesn't work since it wants just Boolean logic. Is there a work around for this?

EDIT: Could still be useful for others since it creates a new column and does a little more than just a check of isin().

Upvotes: 0

Views: 680

Answers (1)

Akhil Batra
Akhil Batra

Reputation: 610

Use col.isin method

df2 = df2.withColumn('Industrial', F.when(F.col('CODE').isin((1,2,3,4)), 1).otherwise(0))

Upvotes: 1

Related Questions