Greg Clinton
Greg Clinton

Reputation: 375

How to combine two spark data frames in sorted order

I want to combine two dataframes a and b into a dataframe c that is sorted on a column.

val a = Seq(("a", 1), ("c", 2), ("e", 3)).toDF("char", "num")
val b = Seq(("b", 4), ("d", 5)).toDF("char", "num")
val c = // how do I sort on char column?

Here is the result I want:

 a.show()     b.show()      c.show()
+----+---+   +----+---+    +----+---+
|char|num|   |char|num|    |char|num|
+----+---+   +----+---+    +----+---+
|   a|  1|   |   b|  4|    |   a|  1|
|   c|  2|   |   d|  5|    |   b|  4|
|   e|  3|   +----+---+    |   c|  2|
+----+---+                 |   d|  5|
                           |   e|  3|
                           +----+---+

Upvotes: 1

Views: 2636

Answers (2)

NAGARJUN446
NAGARJUN446

Reputation: 11

if you want to do union for multiple dataframes we can try this way.

   val df1 = sc.parallelize(List(
  (50, 2, "arjun"),
  (34, 4, "bob")
)).toDF("age", "children","name")

val df2 = sc.parallelize(List(
  (51, 3, "jane"),
  (35, 5, "bob")
)).toDF("age", "children","name")

val df3 = sc.parallelize(List(
  (50, 2,"arjun"),
  (34, 4,"bob")
)).toDF("age", "children","name")


val result= Seq(df1, df2, df3)
val res_union=result.reduce(_ union _).sort($"age",$"name",$"children")
res_union.show()

Upvotes: 1

mrsrinivas
mrsrinivas

Reputation: 35404

In simple, you can use sort() on each dataframe and union().

val a = Seq(("a", 1), ("c", 2), ("e", 3)).toDF("char", "num").sort($"char")
val b = Seq(("b", 4), ("d", 5)).toDF("char", "num").sort($"char")

val c = a.union(b).sort($"char")

Upvotes: 2

Related Questions