user4046073
user4046073

Reputation: 871

spark add a col to dataframe with condtions on another df

I have the following problem: I want to add a column RealCity to dataframe A, when City value is 'noClue', I what to select from df B, using the Key, to get the City.

Table A:

   +---------+--------+
   |     Key |    City|   
   +---------+--------+
   |a        |    PDX |   
   +---------+--------+
   |b        | noClue | 

Table B:

   +---------+--------+
   |     Key |  Name  |   
   +---------+--------+
   |c        |    SYD |   
   +---------+--------+
   |b        |   AKL  | 

I want to use .withColumnand when but I can't select value another table (table B) by doing it this way. What's a good way of doing this? Many Thanks!

Upvotes: 0

Views: 355

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

Given that you have two dataframes

A:

+---+------+
|key|City  |
+---+------+
|a  |PDX   |
|b  |noClue|
+---+------+

B:

+---+----+
|key|Name|
+---+----+
|a  |SYD |
|b  |AKL |
+---+----+

You can simply join them with common Key and use withColumn and when function as

val finalDF = A.join(B, Seq("Key"), "left").withColumn("RealCity", when($"City" === "noClue", $"Name").otherwise($"City")).drop("Name")

you should have final output as

+---+------+--------+
|key|City  |RealCity|
+---+------+--------+
|a  |PDX   |PDX     |
|b  |noClue|AKL     |
+---+------+--------+

Upvotes: 7

Related Questions