aamirmalik124
aamirmalik124

Reputation: 125

Add new column in pyspark data frame comparing two column present in same data frame

I have a data frame with two columns COL_1 and COL_2.

enter image description here

I want to add one more column COL_3 and COL_3 value will depend on the comparison of COL_1 and COL_2 as per below table.

enter image description here

When both values same COL_3 = Valid

When both values different COL_3 = Invalid

When both value the null COL_3 = null

I tried something below code but it's not working.

df_Input = dataframe.withColumn("COL_3", (col("COL_1") != col("COL_1")), lit("Invalid")).otherwise(lit("valid"))

Upvotes: 0

Views: 665

Answers (2)

aamirmalik124
aamirmalik124

Reputation: 125

```df = df.withColumn('COL_3',\
        when((col("COL_1") == col("COL_2")), 'Valid').\
        when((col("COL_1") != col("COL_2")), 'Invalid').\
        otherwise(lit("NA")))```

Here I am first adding one column COL_3 and with the help of when function I checked weather COL_1 and COL_2 are equal or differ and I assigned valid and invalid values to COL_3. If COL_1 and COL_2 are blank then it will assign NA to COL_3.

Upvotes: 0

Jay Kakadiya
Jay Kakadiya

Reputation: 541

first will add col3 with default value using lit function while reading CSV file

df = spark.read.format("csv").option("header", "true").option("delimiter","|").load('test.csv').withColumn('COL_3',lit('Invalid'))

now will check the condition using when function

df = df.withColumn('COL_3', when((col("COL_1") == col("COL_1")), 'Valid').when((col("COL_1") == 'null') & (col("COL_2" == 'null')),'null').otherwise(col('COL_3')))

Upvotes: 0

Related Questions