Reputation: 17042
In the below example I would like to override the values in Spark Dataframe A with corresponding value in Dataframe B (if it exists). Is there a way to do it using Spark (Scala)?
Dataframe A
ID Name Age
1 Paul 30
2 Sean 35
3 Rob 25
Dataframe B
ID Name Age
1 Paul 40
Result
ID Name Age
1 Paul 40
2 Sean 35
3 Rob 25
Upvotes: 1
Views: 96
Reputation: 1167
The combined use of a left join
and coalesce
should do the trick, something like:
dfA
.join(dfB, "ID", "left")
.select(
dfA.col("ID"),
dfA.col("Name"),
coalesce(dfB.col("Age"), dfA.col("Age")).as("Age")
)
Explanation: For a specific ID some_id
, there is 2 cases:
dfB
does not contain some_id
: then the left join will produce null for dfB.col("Age")
and the
coalesce
will return the first non-null value from expressions we
passed to it, i.e. the value of dfA.col("Age")
dfB
contains some_id
then the value from dfB.col("Age")
will be used.Upvotes: 1