How to combine rows in two data frames given a certain condition in Scala?

Question

I have these two data frames called:

table1

+---------+------------+------+
|  Animal |    Owner   |count1|
+---------+------------+------+
|    Cat  |     Bob    |   3  |
|    Fish |    Jerry   |   2  |
|    Dog  |     Bob    |   2  |
|  Turtle |     Joe    |   5  |
+---------+------------+------+

table2

+---------+------------+------+
|  Animal |    Owner   |count2|
+---------+------------+------+
|    Cat  |     Bob    |   2  |
|    Fish |     Jerry  |   1  |
|    Dog  |     Bob    |   3  |
|   Snake |     Kim    |   6  |
+---------+------------+------+

I am trying to combine both of these data frames in a way, such that the new data frame below would contain rows

that appear in either 'table1' or table2'
where distinct rows found in both tables contain count values that are greater in 'table2' than in 'table1'

This is the expected output data frame I want to produce.

+---------+------------+------+------+
|  Animal |   Owner    |count1|count2|
+---------+------------+------+------+
|   Dog   |     Bob    |   2  |   3  |
|  Turtle |     Joe    |   5  | null |
|   Snake |     Kim    | null |   6  |
+---------+------------+------+------+

Rows that appear in 'table1' and not in 'table2' (or in 'table2' and not in 'table1') can have their count values be 'null'.

Mahesh Gupta · Accepted Answer

In Spark Try full join with filter condition

scala> var t1 = Seq(("Cat","Bob",3),  ("Fish" ,"Jerry" ,2),  ("Dog"  ,    "Bob",2),  ("Turtle" ,"Joe",5)).toDF("Animal","Owner","count1")

scala> var t2 = Seq(("Cat", "Bob",2),("Fish","Jerry",1),("Dog" ,"Bob",3),("Snake","Kim",6)).toDF("Animal","Owner","count2")

In dataframe t1(table1) and t2(table2) applying full join with keeping null rows of both count columns from tables.

scala> t2.join(t1,Seq("Animal","Owner"),"full").filter(col("count2")>col("count1") || col("count2").isNull || col("count1").isNull).show
+------+-----+------+------+
|Animal|Owner|count2|count1|
+------+-----+------+------+
|   Dog|  Bob|     3|     2|
| Snake|  Kim|     6|  null|
|Turtle|  Joe|  null|     5|
+------+-----+------+------+

How to combine rows in two data frames given a certain condition in Scala?

Answers (1)

Related Questions