Reputation: 871
I have the following problem: I want to add a column RealCity
to dataframe A, when City value is 'noClue', I what to select from df B, using the Key, to get the City.
Table A:
+---------+--------+
| Key | City|
+---------+--------+
|a | PDX |
+---------+--------+
|b | noClue |
Table B:
+---------+--------+
| Key | Name |
+---------+--------+
|c | SYD |
+---------+--------+
|b | AKL |
I want to use .withColumn
and when
but I can't select value another table (table B) by doing it this way. What's a good way of doing this? Many Thanks!
Upvotes: 0
Views: 355
Reputation: 41957
Given that you have two dataframes
A:
+---+------+
|key|City |
+---+------+
|a |PDX |
|b |noClue|
+---+------+
B:
+---+----+
|key|Name|
+---+----+
|a |SYD |
|b |AKL |
+---+----+
You can simply join
them with common Key
and use withColumn
and when
function as
val finalDF = A.join(B, Seq("Key"), "left").withColumn("RealCity", when($"City" === "noClue", $"Name").otherwise($"City")).drop("Name")
you should have final output as
+---+------+--------+
|key|City |RealCity|
+---+------+--------+
|a |PDX |PDX |
|b |noClue|AKL |
+---+------+--------+
Upvotes: 7