Reputation: 151
I have a Hive table that looks like the following table
ID | Value | Timestamp (epoch)| Status
1 2300 1516187739 | Active
1 2500 1516187403 | Stopped
1 1800 1516187450 | Stopped
2 1300 1516187730 | Active
2 1500 1516187780 | Active
. I want to get the latest record by looking at the timestamp for each ID when status is active. I want to have only one record per ID. So the result would look like the table below. I am looking for an efficient way to implement this query in Hive.
ID | Value |
1 2300
2 1500
Upvotes: 0
Views: 81
Reputation: 31716
You could use row_number()
analytic function for such purpose.
SELECT a.id, a.VALUE
FROM (SELECT id,
VALUE,
ROW_NUMBER () OVER (PARTITION BY id ORDER BY timestamp DESC)
rn
FROM yourtable) a
WHERE rn = 1;
Upvotes: 2