Reputation: 385
My source data is like below
id start_time end_time
abc123 2022-07-03 12:00:00.0 2022-07-04 01:59:00.0
abc1234 2022-07-03 12:00:00.0 2022-07-04 01:59:00.0
abc123A 2022-07-03 13:00:00.0 2022-07-03 14:59:00.0
how to write a sql to get below output in hive. I am expecting sql should be dynamic since there is huge data
datetime Count
2022-07-03 12:00:00 2
2022-07-03 13:00:00 3
2022-07-03 14:00:00 3
2022-07-03 15:00:00 2
2022-07-03 16:00:00 2
2022-07-03 17:00:00 2
2022-07-03 18:00:00 2
2022-07-03 19:00:00 2
2022-07-03 20:00:00 2
2022-07-03 21:00:00 2
2022-07-03 22:00:00 2
2022-07-03 23:00:00 2
2022-07-04 00:00:00 2
2022-07-04 01:00:00 2
Upvotes: 0
Views: 525
Reputation: 385
Doing an inequality join with dimension gave me expected o/p
Dimension
+-------------------+
|start_time |
+-------------------+
|2020-03-31 12:00:00|
|2020-03-31 13:00:00|
|2020-03-31 14:00:00|
|2020-03-31 15:00:00|
|2020-03-31 16:00:00|
|2020-03-31 17:00:00|
|2020-03-31 18:00:00|
|2020-03-31 19:00:00|
|2020-03-31 20:00:00|
|2020-03-31 21:00:00|
|2020-03-31 22:00:00|
|2020-03-31 23:00:00|
|2020-04-01 00:00:00|
|2020-04-01 01:00:00|
|2020-04-01 02:00:00|
|2020-04-01 03:00:00|
+-------------------+
Join Condition
tab.start_time <= dim.start_time and tab.end_time > dim.start_time
Upvotes: 0
Reputation: 1270613
There is a trick in Hive to generating a series of numbers -- and these can then be turned into dates. This is a pain, but there is a trick using space()
and posexplode()
-- which is adapted from here.
I'm not sure if the syntax is 100% correct, but something like this:
with hh as (
select unix_timestamp(min(start_time)) + (n - 1) * 60*60 as hh_start,
unix_timestamp(min(start_time)) + n * 60*60 as hh_end
from (select unix_timestamp(min(start_time)) as min_st,
floor((unix_timestamp(max(end_time)) - unix_timestamp(min(start_time))) / (60 * 60)) as num_hours
from t
) x lateral view
posexplode(split(space(num_hours + 1), ' ')) pe as n, x
)
select hh.hh_start, count(t.id)
from hh left join
t
on t.start_time < hh.hh_end and
t.end_time >= hh.hh_start
group by hh.hh_start
order by h.hh_start;
Upvotes: 1