Reputation: 1821
I have a table which contains the following values. t1 is considered as day1 with hour values separated by hyphen. I would like to consolidate unique pairs (i.e. p<1..n>, c<1..n> for a day), based on the last occurrence in a day.
id1, p1, c1, t1-1
id2, p1, c1, t1-2
id3, p2, c2, t1-2
id4, p3, c3, t1-3
id5, p1, c1, t1-4
id6, p2, c2, t1-4
id7, p1, c3, t1-4
id8, p2, c2, t1-5
id9, p2, c3, t1-5
id10, p2, c4, t1-5
id11, p2, c3, t1-6
id12, p1, c1, t2-1
id13, p1, c2, t2-1
id14, p2, c2, t2-3
id15, p2, c3, t2-3
id16, p1, c1, t2-4
The output should be
id5, p1, c1, t1-4
id8, p2, c2, t1-5
id4, p3, c3, t1-3
id7, p1, c3, t1-4
id11, p2, c3, t1-6
id10, p2, c4, t1-5
id16, p1, c1, t2-4
id13, p1, c2, t2-1
id14, p2, c2, t2-3
id15, p2, c3, t2-3
Upvotes: 1
Views: 107
Reputation: 92795
One way to do it
SELECT id, p, c, t
FROM
(
SELECT id, p, c, t
FROM table1
ORDER BY t DESC
) q
GROUP BY SUBSTRING_INDEX(t, '-', 1), p, c
Output:
| ID | P | C | T | |------|----|----|------| | id5 | p1 | c1 | t1-4 | | id7 | p1 | c3 | t1-4 | | id8 | p2 | c2 | t1-5 | | id11 | p2 | c3 | t1-6 | | id10 | p2 | c4 | t1-5 | | id4 | p3 | c3 | t1-3 | | id16 | p1 | c1 | t2-4 | | id13 | p1 | c2 | t2-1 | | id14 | p2 | c2 | t2-3 | | id15 | p2 | c3 | t2-3 |
Here is a SQLFiddle demo
UPDATE: In Hive you can leverage analytic functions, specifically ROW_NUMBER()
SELECT id, p, c, t
FROM
(
SELECT id, p, c, t,
ROW_NUMBER() OVER (PARTITION BY SUBSTR(t, 1, INSTR(t, '-') - 1), p, c ORDER BY t DESC) rnum
FROM table1
) q
WHERE rnum = 1
Here is a SQLFiddle demo. It's for Oracle but should work in Hive as is.
Upvotes: 2