Reputation: 883
Say I've got 3 columns in a table: id, flag, time. Flag can only be one of the three: A1, A2, B.
ID flag time
1 A1 2016-01-01
1 A2 2016-01-02
1 B 2016-01-03
1 B 2016-01-04
2 A1 2016-01-02
2 B 2016-01-03
2 A2 2016-01-04
2 B 2016-01-05
The data has been sorted by time for each ID. Now I'd like to get, for each ID, when the flag equals B, the last non-B flag, e.g.:
1 B 2016-01-03 A2 2016-01-02
1 B 2016-01-04 A2 2016-01-02
2 B 2016-01-03 A1 2016-01-02
2 B 2016-01-05 A2 2016-01-04
Is this even possible in a Hive query?
Upvotes: 1
Views: 1520
Reputation: 44911
select id
,flag
,time
,A.flag as A_flag
,A.time as A_time
from (select id
,flag
,time
,max
(
case
when flag <> 'B'
then named_struct ('time',time,'flag',flag)
end
) over
(
partition by id
order by time
rows unbounded preceding
) as A
from t
) t
where flag = 'B'
;
+----+------+------------+--------+------------+
| id | flag | time | a_flag | a_time |
+----+------+------------+--------+------------+
| 1 | B | 2016-01-03 | A2 | 2016-01-02 |
| 1 | B | 2016-01-04 | A2 | 2016-01-02 |
| 2 | B | 2016-01-03 | A1 | 2016-01-02 |
| 2 | B | 2016-01-05 | A2 | 2016-01-04 |
+----+------+------------+--------+------------+
P.s.
time
) as column name.time
for date column.Upvotes: 1
Reputation: 49260
Use max
window function to get the running maximum time for non B flags. Then join
this result to the original table to get the flag information for the corresponding max time (before flag B for a given id).
SELECT X.*,
T.FLAG
FROM
(SELECT T.*,
MAX(CASE WHEN FLAG<>'B' THEN TIME END) OVER(PARTITION BY ID ORDER BY TIME) AS MAX_TIME_BEFORE_B
FROM T
) X
JOIN T ON T.ID=X.ID AND T.TIME=X.MAX_TIME_BEFORE_B
WHERE X.FLAG='B'
Upvotes: 1