Reputation: 119
I have two dataframes :
df1:
+--------------+---------------------+
|id_device |tracking_time |
+--------------+---------------------+
|20 |2020-02-19 02:37:45 |
|5 |2020-02-17 17:15:45 |
+--------------+---------------------+
df2
+--------------+----------------------+
|id_device |tracking_time |
+--------------+----------------------+
|20 | 2019-02-19 02:41:45 |
|20 |2020-01-17 17:15:45 |
+--------------+----------------------+
I want to get the following output :
+--------------+---------------------+------------------+
|id_device |tracking_time | df2.tracking_time |
+--------------+---------------------+------------------+
|20 |2020-02-19 02:37:45 |2019-02-19 02:41:45|
|5 |2020-02-17 17:15:45 |null |
+--------------+---------------------+-------------------+
I tried the following code :
df1.registerTempTable("data");
df2.createOrReplaceTempView("tdays");
Dataset<Row> d_f = sparkSession.sql("select a.* , b.* from data as a LEFT JOIN (select * from tdays ) as b on b.id_device == a.id_device and b.tracking_time < a.tracking_time ");
I get the following output :
+----------------------+---------------------+--------------------+------------------ -+
|id_device |tracking_time | b.id_device |b.tracking_time |
+----------------------+---------------------+--------------------+--------------------+
|20 |2020-02-19 02:37:45 |20 | 2019-02-19 02:41:45|
|20 |2020-02-19 02:37:45 |20 | 2020-01-17 17:15:45|
|5 |2020-02-17 17:15:45 |null |null |
+-----------------------+--------------------+--------------------+--------------------+
What I want is to join the first dataframe with result of left join ordered by df2.tracking_time desc limit 1
I need your help
Upvotes: 1
Views: 47
Reputation: 14845
Before the join, you can reduce df2
to the minimum dates for each id_device
:
val df1 = ...
val df2 = ...
val df2min = df2.groupBy("id_device").agg(min("tracking_time")).as("df2.tracking_time")
val result = df1.join(df2min, Seq("id_device"), "left")
df2min
contains only a single row with the minimum date from df2
per id. Therefore the left join will return the expected result.
Upvotes: 1