Reputation: 117
First Df is:
ID Name ID2 Marks
1 12 1 333
Second Df2 is:
ID Name ID2 Marks
1 3 989
7 98 8 878
I need output is:
ID Name ID2 Marks
1 12 1 333
1 3 989
7 98 8 878
Kindly help!
Upvotes: 6
Views: 20857
Reputation: 993
Use union
or unionAll
function:
df1.unionAll(df2)
df1.union(df2)
for example:
scala> val a = (1,"12",1,333)
a: (Int, String, Int, Int) = (1,12,1,333)
scala> val b = (1,"",3,989)
b: (Int, String, Int, Int) = (1,"",3,989)
scala> val c = (7,"98",8,878)
c: (Int, String, Int, Int) = (7,98,8,878)
scala> import spark.implicits._
import spark.implicits._
scala> val df1 = List(a).toDF("ID","Name","ID2","Marks")
df1: org.apache.spark.sql.DataFrame = [ID: int, Name: string ... 2 more fields]
scala> val df2 = List(b, c).toDF("ID","Name","ID2","Marks")
df2: org.apache.spark.sql.DataFrame = [ID: int, Name: string ... 2 more fields]
scala> df1.show
+---+----+---+-----+
| ID|Name|ID2|Marks|
+---+----+---+-----+
| 1| 12| 1| 333|
+---+----+---+-----+
scala> df2.show
+---+----+---+-----+
| ID|Name|ID2|Marks|
+---+----+---+-----+
| 1| | 3| 989|
| 7| 98| 8| 878|
+---+----+---+-----+
scala> df1.union(df2).show
+---+----+---+-----+
| ID|Name|ID2|Marks|
+---+----+---+-----+
| 1| 12| 1| 333|
| 1| | 3| 989|
| 7| 98| 8| 878|
+---+----+---+-----+
Upvotes: 6
Reputation: 41957
A simple union
or unionAll
should do the trick for you
Df.union(Df2)
or
Df.unionAll(Df2)
As given in the api document
Returns a new Dataset containing union of rows in this Dataset and another Dataset.
This is equivalent toUNION ALL
in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by a [[distinct]].
Also as standard in SQL, this function resolves columns by position (not by name).
Upvotes: 1