Manos
Manos

Reputation: 151

How do I select the latest record for each disticnt id in Hive?

I have a Hive table that looks like the following table

ID | Value | Timestamp (epoch)| Status
 1   2300      1516187739     | Active
 1   2500      1516187403     | Stopped
 1   1800      1516187450     | Stopped
 2   1300      1516187730     | Active
 2   1500      1516187780     | Active

. I want to get the latest record by looking at the timestamp for each ID when status is active. I want to have only one record per ID. So the result would look like the table below. I am looking for an efficient way to implement this query in Hive.

ID | Value | 
 1   2300      
 2   1500      

Upvotes: 0

Views: 81

Answers (1)

Kaushik Nayak
Kaushik Nayak

Reputation: 31716

You could use row_number() analytic function for such purpose.

 SELECT a.id, a.VALUE
  FROM (SELECT id,
               VALUE,
               ROW_NUMBER () OVER (PARTITION BY id ORDER BY timestamp DESC)
                  rn
          FROM yourtable) a
 WHERE rn = 1;

Upvotes: 2

Related Questions