Reputation: 77
I need your help for a table. I am using an hql query in Hive to load a table. Anyone have an idea to load the table?
TABLE USER
START_TIME_DATE | END_TIME_DATE | USER | START_DAY_ID | END_DAY_ID
(String) (String) (Bigint) (Int) (Int)
210241 231236 1 01092019 01092019
234736 235251 2 01092019 01092019
223408 021345 3 01092019 02092019
The START_TIME_DATE, END__TIME_DATE fields indicate the time the user has been in the place. The idea is to indicate in different rows each of the hours the user has been, only the first two figures in the "hour" field.
TABLE USERHOUR
DATE | HOUR | ID
(Bigint) (String) (Bigint)
01092019 21 1
01092019 22 1
01092019 23 1
01092019 23 2
01092019 22 3
01092019 23 3
02092019 00 3
02092019 01 3
02092019 02 3
Currently my query is this, but it doesn't work. I am trying with "union all"
insert overwrite table USERHOUR
(select [start_time_date] ,[end_time_date]
from user
union all
select [start_time_date]+1,[end_time_date]
where [start_time_date]+1<=[end_time_date]
)
as hour) --generate a range between start_time_date and end_time_date and before cast to Hours,
end_day_id a date,
user as id
from table USER;
Upvotes: 3
Views: 180
Reputation: 38335
To do this I calculated the difference in hours, generated rows using posexplode(space(hours)), calculated start timestamp+(position from explode)*3600 and extracted hours and date from resulted timestamp.
See this demo using your example:
with mydata as(--this is your data
select stack(3,
'210241', '231236', 1, '01092019', '01092019',
'234736', '235251', 2, '01092019', '01092019',
'223408', '021345', 3, '01092019', '02092019'
) as (START_TIME_DATE,END_TIME_DATE,USER,START_DAY_ID,END_DAY_ID))
select --extract date, hour from timestamp calculated
--this can be done in previous step
--this subquery is to make code cleaner
date_format(dtm, 'ddMMyyyy') as DATE,
date_format(dtm, 'HH') as HOUR,
user as ID
from
(
select user,
start, h.i, hours, --these columns are for debugging
from_unixtime(start+h.i*3600) dtm --add hour (in seconds) to the start unix timestamp
--and convert to timestamp
from
(
select user,
--start timestamp (unix timestamp in seconds)
unix_timestamp(concat(START_DAY_ID, ' ', substr(START_TIME_DATE,1,2)),'ddMMyyyy HH') as start,
floor((unix_timestamp(concat(END_DAY_ID, ' ', substr(END_TIME_DATE,1,2)),'ddMMyyyy HH')-
unix_timestamp(concat(START_DAY_ID, ' ', substr(START_TIME_DATE,1,2)),'ddMMyyyy HH')
)/ --diff in seconds
3600) as hours --diff in hours
from mydata
)s
lateral view posexplode(split(space(cast(s.hours as int)),' ')) h as i,x --this will generate rows
)s
;
Result:
OK
01092019 21 1
01092019 22 1
01092019 23 1
01092019 23 2
01092019 22 3
01092019 23 3
02092019 00 3
02092019 01 3
02092019 02 3
Time taken: 3.207 seconds, Fetched: 9 row(s)
Upvotes: 3