Shebang_John
Shebang_John

Reputation: 121

How to merge 2 Spark dataframe using if else conditions

How can we merge 2 dataframes and form a new data using conditions.for eg. if data is present in dataframe B , use the row from dataframe B else use data from dataframe A.

DataFrame A

+-----+-------------------+--------+------+
| Name|           LastTime|Duration|Status|
+-----+-------------------+--------+------+
|  Bob|2015-04-23 12:33:00|       1|logout|
|Alice|2015-04-20 12:33:00|       5| login|
+-----+-------------------+--------+------+

DataFrame B

+-----+-------------------+--------+------+
| Name|           LastTime|Duration|Status|
+-----+-------------------+--------+------+
|  Bob|2015-04-24 00:33:00|       1|login |
+-----+-------------------+--------+------+

I want to form a new dataframe by using whole data in Dataframe A but update rows using data in B

+-----+-------------------+--------+------+
| Name|           LastTime|Duration|Status|
+-----+-------------------+--------+------+
|  Bob|2015-04-24 00:33:00|       1|login |
|Alice|2015-04-20 12:33:00|       5| login|
+-----+-------------------+--------+------+

I tried full outer join as

val joined = df.as("a").join(df.as("b")).where($"a.name" === $"b.name","outer")

But it resulted in 1 row with duplicate columns.How can I ignore the row in first table if there is one corresponding row is present in second.

Upvotes: 1

Views: 576

Answers (1)

Naihuangbao
Naihuangbao

Reputation: 36

val combined_df = dfa.join(dfb,Seq("Name"),"right").select(dfa("Name"), coalesce(dfa("LastTime"), dfb("LastTime")), coalesce(dfa("Duration"), dfb("Duration")),coalesce(dfa("Status"), dfb("Status")))

Upvotes: 1

Related Questions