Vanaja Jayaraman
Vanaja Jayaraman

Reputation: 781

Hive - get latest record with no blank column

I have 2 tables: Table 1:

emp_id emp_log_id timestamp
1      234       04 Oct 2019 23:10
1                05 Oct 2019 23:10
2      335       04 Oct 2019 23:10
2                03 Oct 2019 23:10
3                04 Oct 2019 23:10
4      324       04 Oct 2019 23:10

My expected output is:

emp_id emp_log_id timestamp
1 234 04 Oct 2019 23:10
2 335 04 Oct 2019 23:10
3  04 Oct 2019 23:10
4 324 04 Oct 2019 23:10
  1. If the latest record has emp_log_id, then take that(sample emp_id: 2
  2. If the latest record doesn't have emp_log_id then go back to the previously updated record(sample emp_id: 1)

How to write hive query for this.

Another table has data like: Table 2:

emp_id emp_log_id
    1 234
    1  05
    2 335
    2  03
    3  04
    4 324

How to implement the same requirement in this table 2.

Please help.

Upvotes: 0

Views: 101

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269633

I would recommend row_number():

select t.*
from (select t.*,
             row_number() over (partition by emp_id
                                order by (emp_log_id is not null) desc, timestamp desc
                               ) as seqnum
      from t
     ) t
where seqnum = 1

Upvotes: 1

Ed Bangga
Ed Bangga

Reputation: 13006

using aggregation function max() will do it.

select emp_id, max(emp_log_id) as emp_log_id, timestamp 
from table1
group by emp_id, timestamp

Upvotes: 1

Related Questions