Shantanu Jain
Shantanu Jain

Reputation: 87

Unable to assign new value to a column in pyspark dataframe using column attribute

I have a pyspark dataframe event1. It has many columns and one of them is eventAction having categorical values like 'conversion', 'check-out', etc.

I wanted to convert this column in a way that 'conversion' becomes 1 and other categories become 0 in eventAction column.

This is what I tried:

event1.eventAction = event1.select(F.when(F.col('eventAction') == 'conversion', 1).otherwise(0))
event1.show()

But I don't see any change in eventAction column when .show() is executed.

Upvotes: 0

Views: 775

Answers (1)

mck
mck

Reputation: 42392

Spark dataframes are immutable, so you cannot change the column directly using the . notation. You need to create a new dataframe that replaces the existing column using withColumn.

import pyspark.sql.functions as F

event1 = event1.withColumn(
    'eventAction', 
    F.when(F.col('eventAction') == 'conversion', 1).otherwise(0)
)

Upvotes: 1

Related Questions