Reputation: 27
I have two dataframes for example,
DF1
+----+-------+------+------+
|id | value1|value2|value3|
+----+-------+------+------+
| 1 |10 |20 |30 |
| 2 |20 |30 |40 |
| 3 |30 |40 |50 |
+----+-------+------+------+
DF2
+----+-------+------+------+
|id | value1|value2|value3|
+----+-------+------+------+
| 1 |100 |200 |300 |
| 2 |200 |300 |400 |
| 3 |300 |400 |500 |
+----+-------+------+------+
I want to merge these two dataframes by id and at the same time add the columns with the same name in the two dataframes together. So what I want is something like this
+----+-------+------+------+
|id | value1|value2|value3|
+----+-------+------+------+
| 1 |100+10 |200+20|300+30|
| 2 |200+20 |300+30|400+40|
| 3 |300+30 |400+40|500+50|
+----+-------+------+------+
Merging part is fairly easy with spark
df1.join(df2,Seq("id"))
Is there an efficient way to add the two columns with the same name in the two dataframes together as above with scala and spark.
Upvotes: 1
Views: 272
Reputation: 10406
In scala it is quite easy to manipulate columns in a generic way. You can do it like this.
// extract the names of the columns to sum
val cols = df1.columns.filter(_!="id")
// join and sum
val result = df1
.join(df2,Seq("id"))
.select( col("id") +: cols.map(c=>df1(c)+df2(c) as c) : _*)
Upvotes: 1