Reputation: 89
I have two dataframes. What I want to do exactly is:
If the column Name is "P" then I have to select the column called FinalValue of the DF2 where the column id_1 match the column Id_name of the DF2, otherwise I have to fill it with nulls.
For example, I have the following DataFrames (DF1 and DF2):
+--------+-------+-------+
|Name | value | id_1 |
+- ------+-------+-------+
|P |5 | being |
|X |1 | dose |
|Z |1 | yex |
df2
+--------+------------+
|Id_name | FinalValue |
+- ------+------------+
|ash | h32 |
|being | c11 |
|dose | g21 |
In this case the output should be:
+--------+-------+-------------+
|Name | value | FinalValue |
+- ------+-------+-------------+
|P |5 | c11 |
|X |1 | null |
|Z |1 | null |
What I am trying is the following:
var df3 = df1.withColumn("FinalValue", when($"Name" === "P", df2.select(...)))
But as you can see, I don't know how to continue because if I select a column of the DF2 I can't select another of the DF1. How can I do this?
Maybe my explanation is not good enough, if you need more information or explanation, just tell me it. Thanks in advance.
Upvotes: 0
Views: 756
Reputation: 42352
You can do a left join, then mask the final value using when
:
val df3 = df1.join(
df2,
df1("id_1") === df2("Id_name"),
"left"
).select(
df1.columns.dropRight(1).map(col) :+
when($"Name" === "P", $"FinalValue").as("FinalValue")
: _*
)
df3.show
+----+-----+----------+
|Name|value|FinalValue|
+----+-----+----------+
| P| 5| c11|
| X| 1| null|
| Z| 1| null|
+----+-----+----------+
Upvotes: 1