Mauro Gentile
Mauro Gentile

Reputation: 1511

Change a pyspark column based on the value of another column

I have a pyspark dataframe, called df.

ONE LINE EXAMPLE:

df.take(1)
[Row(data=u'2016-12-25',nome=u'Mauro',day_type="SUN")]

I have a list of holidays day:

holydays=[u'2016-12-25',u'2016-12-08'....]

I want to switch day_type to "HOLIDAY" if "data" is in holydays list otherwise I want to leave day_type field as it is.

This is my non working tentative:

df=df.withColumn("day_type",when(col("data") in holydays, "HOLIDAY").otherwise(col("day_type")))

PySpark does not like the expression "in holydays". It returns this error:

ValueError: Cannot convert column into bool: please use '&' for 'and', '|' 

Upvotes: 1

Views: 11627

Answers (2)

desertnaut
desertnaut

Reputation: 60321

Regarding your first question - you need isin:

spark.version
# u'2.2.0'

from pyspark.sql import Row
from pyspark.sql.functions import col, when

df=spark.createDataFrame([Row(data=u'2016-12-25',nome=u'Mauro',day_type="SUN")])

holydays=[u'2016-12-25',u'2016-12-08']

df.withColumn("day_type",when(col("data").isin(holydays), "HOLIDAY").otherwise(col("day_type"))).show()
# +----------+--------+-----+
# |      data|day_type| nome|
# +----------+--------+-----+
# |2016-12-25| HOLIDAY|Mauro|
# +----------+--------+-----+

Regarding your second question - I don't see any issue:

df.withColumn("day_type",when(col("data")=='2016-12-25', "HOLIDAY").otherwise(col("day_type"))).filter("day_type='HOLIDAY'").show()
# +----------+--------+-----+ 
# |      data|day_type| nome| 
# +----------+--------+-----+
# |2016-12-25| HOLIDAY|Mauro|
# +----------+--------+-----+

BTW, it's a always a good idea to provide a little more than a single row of sample data...

Upvotes: 4

Harsh Bafna
Harsh Bafna

Reputation: 2224

Use isin function on column instead of using in clause to check if the value is present in a list. Sample code :

df=df.withColumn("day_type",when(df.data.isin(holydays), "HOLIDAY").otherwise(df.day_type)))

Upvotes: 2

Related Questions