Reputation: 2083
I have dataframe obtained after reading a table as below using spark:
val orgDF = spark.read.format("jdbc").options("url", url).options("dbtable","select id, org_name, delete_state, soft_del, 0 as soft_del from schema.table as orgTable").options("user",username)options("password",pwd).load()
I can see the output data from the dataframe as below:
-----------------------------------------
id | org_name | delete_state | soft_del
-----------------------------------------
1 | Net | delete | 0
2 | Vert | delete | 0
3 | Bio | insert | 0
4 | Card | delete | 0
7 | stock | update | 0
-----------------------------------------
Before saving the dataframe on HDFS, I am trying to set the value of the col: soft_del as '1' if the vale of the column in delete_state
is delete
and make a final dataframe as below:
-----------------------------------------
id | org_name | delete_state | soft_del
-----------------------------------------
1 | Net | delete | 1
2 | Vert | delete | 1
3 | Bio | insert | 0
4 | Card | delete | 1
7 | Stock | update | 0
-----------------------------------------
I know there is a way to do it as below:
orgDF.createOrReplaceTempView("orgData")
spark.sql("update orgData set soft_del = 1 where delete_state = 'delete'")
I am also trying to get to know how to use dataframe functions and do it but unable to find correct material. Could anyone let know how can I do it using dataframe functions ?
Upvotes: 0
Views: 53
Reputation: 677
You can try something like this
orgDF.withColumn("soft_del", when($"delete_state" === "delete", 1).otherwise(0))
You can also chain multiple when
s if you need, something like
orgDF.withColumn("soft_del",
when($"delete_state" === "delete", 1)
.when($"delete_state" === "update", 2)
.otherwise(0)
)
Reference
when
function.Upvotes: 2