Chique_Code
Chique_Code

Reputation: 1530

replace multiple values with PySpark

I need to replace the values in column "Action" to be specific values. My data looks like this:

id           ActionName
1         First quartile
2         Midpoint
3         Third quartile
4         Complete

I want to replace those values with numbers. Expected output:

id          ActionName
1           0
2           1
3           2
4           3

I tried the following:

df_new = df.withColumn("ActionName", when (col("ActionName").isin("First quartile"), 0), \
                              (col("ActionName").isin("Midpoint"), 1))

Error: TypeError: withColumn() takes 3 positional arguments but 4 were given

Upvotes: 0

Views: 182

Answers (1)

mck
mck

Reputation: 42422

You can concatenate when to the when statement. This is same as else if

df_new = df.withColumn("ActionName", when(col("ActionName") == "First quartile", 0).when(col("ActionName") == "Midpoint", 1))

If you want to have further replacements, you can concatenate more when statements, e.g. when(..., 1).when(..., 2).when(..., 3)

Upvotes: 2

Related Questions