Reputation: 87
I have a pyspark dataframe event1
. It has many columns and one of them is eventAction
having categorical values like 'conversion', 'check-out', etc.
I wanted to convert this column in a way that 'conversion' becomes 1 and other categories become 0 in eventAction
column.
This is what I tried:
event1.eventAction = event1.select(F.when(F.col('eventAction') == 'conversion', 1).otherwise(0))
event1.show()
But I don't see any change in eventAction
column when .show()
is executed.
Upvotes: 0
Views: 775
Reputation: 42392
Spark dataframes are immutable, so you cannot change the column directly using the .
notation. You need to create a new dataframe that replaces the existing column using withColumn
.
import pyspark.sql.functions as F
event1 = event1.withColumn(
'eventAction',
F.when(F.col('eventAction') == 'conversion', 1).otherwise(0)
)
Upvotes: 1