Metadata
Metadata

Reputation: 2083

How to replace a value in dataframe based on a value from another column in same dataframe?

I have dataframe obtained after reading a table as below using spark:

val orgDF = spark.read.format("jdbc").options("url", url).options("dbtable","select id, org_name, delete_state, soft_del, 0 as soft_del from schema.table as orgTable").options("user",username)options("password",pwd).load()

I can see the output data from the dataframe as below:

-----------------------------------------
id | org_name  | delete_state | soft_del
-----------------------------------------
1  | Net       | delete       |   0
2  | Vert      | delete       |   0
3  | Bio       | insert       |   0
4  | Card      | delete       |   0
7  | stock     | update       |   0
-----------------------------------------

Before saving the dataframe on HDFS, I am trying to set the value of the col: soft_del as '1' if the vale of the column in delete_state is delete and make a final dataframe as below:

  -----------------------------------------
    id | org_name  | delete_state | soft_del
    -----------------------------------------
    1  | Net       | delete       |   1
    2  | Vert      | delete       |   1
    3  | Bio       | insert       |   0
    4  | Card      | delete       |   1
    7  | Stock     | update       |   0
    -----------------------------------------

I know there is a way to do it as below:

orgDF.createOrReplaceTempView("orgData")
spark.sql("update orgData set soft_del = 1 where delete_state = 'delete'")

I am also trying to get to know how to use dataframe functions and do it but unable to find correct material. Could anyone let know how can I do it using dataframe functions ?

Upvotes: 0

Views: 53

Answers (1)

Ashwanth Kumar
Ashwanth Kumar

Reputation: 677

You can try something like this

orgDF.withColumn("soft_del", when($"delete_state" === "delete", 1).otherwise(0))

You can also chain multiple whens if you need, something like

orgDF.withColumn("soft_del", 
  when($"delete_state" === "delete", 1)
  .when($"delete_state" === "update", 2)
  .otherwise(0)
)

Reference

Upvotes: 2

Related Questions