Reputation: 311
Sample Data
touristid|day
ABC|1
ABC|1
ABC|2
ABC|4
ABC|5
ABC|6
ABC|8
ABC|10
The output should be
touristid|trip
ABC|4
Logic behind 4 is count of consecutive days distinct consecutive days sqq 1,1,2 is 1st then 4,5,6 is 2nd then 8 is 3rd and 10 is 4th I want this output using impala query
Upvotes: 1
Views: 195
Reputation: 38335
Get previous day using lag() function, calculate new_trip_flag if the day-prev_day>1, then count(new_trip_flag).
Demo:
with table1 as (
select 'ABC' as touristid, 1 as day union all
select 'ABC' as touristid, 1 as day union all
select 'ABC' as touristid, 2 as day union all
select 'ABC' as touristid, 4 as day union all
select 'ABC' as touristid, 5 as day union all
select 'ABC' as touristid, 6 as day union all
select 'ABC' as touristid, 8 as day union all
select 'ABC' as touristid, 10 as day
)
select touristid, count(new_trip_flag) trip_cnt
from
( -- calculate new_trip_flag
select touristid,
case when (day-prev_day) > 1 or prev_day is NULL then true end new_trip_flag
from
( -- get prev_day
select touristid, day,
lag(day) over(partition by touristid order by day) prev_day
from table1
)s
)s
group by touristid;
Result:
touristid trip_cnt
ABC 4
The same will work in Hive also.
Upvotes: 1