David
David

Reputation: 51

How to generate the max values for new columns in PySpark dataframe?

Suppose I have a pyspark dataframe df.

+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  2|  3|
|  4|  5|
+---+---+

I'd like to add new column c.

column c = max(0, column b - 100)

+---+---+---+
|  a|  b|  c|
+---+---+---+
|  1|200|100|
|  2|300|200|
|  4| 50|  0|
+---+---+---+

How should I generate the new column c in pyspark dataframe? Thanks in advance!

Upvotes: 0

Views: 32

Answers (1)

Shibiraj
Shibiraj

Reputation: 769

Hope you are looking something like this:

from pyspark.sql.functions import col, lit, greatest

df = spark.createDataFrame(
    [
        (1, 200), 
        (2, 300),
        (4, 50),
    ],
    ["a", "b"]  
)
df_new = df.withColumn("c", greatest(lit(0), col("b")-lit(100)))
.show()

Upvotes: 1

Related Questions