Pilar
Pilar

Reputation: 77

calculate range between the difference of two fields with HQL

I need your help for a table. I am using an hql query in Hive to load a table. Anyone have an idea to load the table?

TABLE USER

START_TIME_DATE |    END_TIME_DATE |  USER   | START_DAY_ID  | END_DAY_ID  
    (String)             (String)    (Bigint)     (Int)           (Int)
210241              231236              1         01092019      01092019
234736              235251              2         01092019      01092019
223408              021345              3         01092019      02092019

The START_TIME_DATE, END__TIME_DATE fields indicate the time the user has been in the place. The idea is to indicate in different rows each of the hours the user has been, only the first two figures in the "hour" field.

TABLE USERHOUR

   DATE               |    HOUR    |    ID    
 (Bigint)                (String)    (Bigint)   
 01092019                   21            1
 01092019                   22            1
 01092019                   23            1
 01092019                   23            2            
 01092019                   22            3          
 01092019                   23            3            
 02092019                   00            3
 02092019                   01            3
 02092019                   02            3

Currently my query is this, but it doesn't work. I am trying with "union all"

insert overwrite table USERHOUR
(select [start_time_date] ,[end_time_date]
    from user
    union all
    select [start_time_date]+1,[end_time_date]
    where [start_time_date]+1<=[end_time_date]
    )
as hour) --generate a range between start_time_date and end_time_date and before cast to Hours,
end_day_id a date,
user as id
from table USER; 

Upvotes: 3

Views: 180

Answers (1)

leftjoin
leftjoin

Reputation: 38335

To do this I calculated the difference in hours, generated rows using posexplode(space(hours)), calculated start timestamp+(position from explode)*3600 and extracted hours and date from resulted timestamp.

See this demo using your example:

with mydata as(--this is your data
select stack(3,
'210241', '231236', 1, '01092019', '01092019',
'234736', '235251', 2, '01092019', '01092019',
'223408', '021345', 3, '01092019', '02092019'
) as (START_TIME_DATE,END_TIME_DATE,USER,START_DAY_ID,END_DAY_ID))

select --extract date, hour from timestamp calculated
       --this can be done in previous step
       --this subquery is to make code cleaner
       date_format(dtm, 'ddMMyyyy') as DATE, 
       date_format(dtm, 'HH')       as HOUR, 
       user                         as ID
from
(
select user, 
       start, h.i, hours, --these columns are for debugging
       from_unixtime(start+h.i*3600) dtm --add hour (in seconds) to the start unix timestamp
                                         --and convert to timestamp
from
(
select user,
       --start timestamp (unix timestamp in seconds) 
       unix_timestamp(concat(START_DAY_ID, ' ', substr(START_TIME_DATE,1,2)),'ddMMyyyy HH') as start, 
       floor((unix_timestamp(concat(END_DAY_ID, ' ', substr(END_TIME_DATE,1,2)),'ddMMyyyy HH')-
              unix_timestamp(concat(START_DAY_ID, ' ', substr(START_TIME_DATE,1,2)),'ddMMyyyy HH')
             )/ --diff in seconds
        3600) as hours --diff in hours
  from mydata
)s
lateral view posexplode(split(space(cast(s.hours as int)),' ')) h as i,x --this will generate rows
)s
;

Result:

OK
01092019        21      1
01092019        22      1
01092019        23      1
01092019        23      2
01092019        22      3
01092019        23      3
02092019        00      3
02092019        01      3
02092019        02      3
Time taken: 3.207 seconds, Fetched: 9 row(s)

Upvotes: 3

Related Questions