LN_P
LN_P

Reputation: 1488

Modify values across all column pyspark

I have a pyspark data frame and I'd like to have a conditional replacement of a string across multiple columns, not just one. To be more concrete: I'd like to replace the string 'HIGH' with 1, and everything else in the column with 0. [Or at least replace every 'HIGH' with 1.] In pandas I would do:

df[df == 'HIGH'] = 1

Is there a way to do something similar? Or can I do a loop?

I'm new to pyspark so I don't know how to generate example code.

Upvotes: 1

Views: 1376

Answers (1)

Tim
Tim

Reputation: 2843

You can use the replace method for this:

>>> df.replace("HIGH", "1")

Keep in mind that you'll need to replace like for like datatypes, so attemping to replace "HIGH" with 1 will throw an exception.

Edit: You could also use regexp_replace to address both parts of your question, but you'd need to apply it to all columns:

>>> df = df.withColumn("col1", regexp_replace("col1", "^(?!HIGH).*$", "0"))
>>> df = df.withColumn("col1", regexp_replace("col1", "^HIGH$", "1"))

Upvotes: 2

Related Questions