Reputation: 1335
For example first I have a dataframe like this
+----+-----+-----+--------------------+-----+
|year| make|model| comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla| S| No comment| |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null| null|
+----+-----+-----+--------------------+-----+
we have years 2012, 1997 and 2015. And we have another Dataframe like this
+----+-----+-----+--------------------+-----+
|year| make|model| comment|blank|
+----+-----+-----+--------------------+-----+
|2012|BMW | 3| No comment| |
|1997|VW | GTI | get | |
|2015|MB | C200| good| null|
+----+-----+-----+--------------------+-----+
we also have year 2012, 1997, 2015. How can we merge the rows with same year together? Thanks
The output should be like this
+----+-----+-----+--------------------+-----++-----+-----+--------------------------+
|year| make|model| comment|blank|| make|model| comment|blank|
+----+-----+-----+--------------------+-----++-----+-----+-----+--------------------+
|2012|Tesla| S| No comment| |BMW | 3 | no comment|
|1997| Ford| E350|Go get one now th...| |VW |GTI | get |
|2015|Chevy| Volt| null| null|MB |C200 | Good |null
+----+-----+-----+--------------------+-----++----+-----+-----+---------------------+
Upvotes: 0
Views: 658
Reputation: 2432
You can get what your desired table with a simple join
. Something like:
val joined = df1.join(df2, df1("year") === df2("year"))
I loaded your inputs such that I see the following:
scala> df1.show
...
year make model comment
2012 Tesla S No comment
1997 Ford E350 Go get one now
2015 Chevy Volt null
scala> df2.show
...
year make model comment
2012 BMW 3 No comment
1997 VW GTI get
2015 MB C200 good
When I run the join
, I get:
scala> val joined = df1.join(df2, df1("year") === df2("year"))
joined: org.apache.spark.sql.DataFrame = [year: string, make: string, model: string, comment: string, year: string, make: string, model: string, comment: string]
scala> joined.show
...
year make model comment year make model comment
2012 Tesla S No comment 2012 BMW 3 No comment
2015 Chevy Volt null 2015 MB C200 good
1997 Ford E350 Go get one now 1997 VW GTI get
One thing to note is that your column names may be ambiguous as they are named the same across dataframes (so you could change their names to make operations on your resultant dataframe easier to write).
Upvotes: 1