Badri Prasad Rath
Badri Prasad Rath

Reputation: 61

Redshift SQL: Get date difference based on start and end dates

My table has start_date and end_date from which I need to find the hour difference. The issue is that both of these date times are not on the same day.

user    start_date  end_date    difference
Alex    7/25/2016 16:00 7/26/2016 0:30  8.5
Alex    7/24/2016 16:00 7/25/2016 0:30  8.5
Alex    7/21/2016 16:00 7/22/2016 0:30  8.5
Alex    7/20/2016 16:00 7/21/2016 0:30  8.5
Alex    7/19/2016 16:00 7/20/2016 0:30  8.5
Alex    7/18/2016 16:00 7/19/2016 0:30  8.5
Alex    7/17/2016 16:00 7/18/2016 0:30  8.5
Alex    7/14/2016 16:00 7/15/2016 0:30  8.5
Alex    7/13/2016 16:00 7/14/2016 0:30  8.5
Alex    7/12/2016 16:00 7/13/2016 0:30  8.5
Alex    7/11/2016 16:00 7/12/2016 0:30  8.5
Alex    7/10/2016 16:00 7/11/2016 0:30  8.5

Usually it is 5 working days and I get the answer if I group them by start_date. But I need an new date column where I need the output as below. Please note that 15/7/2016 and 22/7/2016 was not present in the above table. I need the additional 0.5 hour & date for the 6th day to be included to my derived table.

User    Date    difference
Alex    7/25/2016   8.5
Alex    7/24/2016   8.5
Alex    7/22/2016   0.5
Alex    7/21/2016   8.0
Alex    7/20/2016   8.5
Alex    7/19/2016   8.5
Alex    7/18/2016   8.5
Alex    7/17/2016   8.5
Alex    7/15/2016   0.5
Alex    7/14/2016   8.0
Alex    7/13/2016   8.5
Alex    7/12/2016   8.5
Alex    7/11/2016   8.5
Alex    7/10/2016   8.5

I calculate the difference as

round(cast(datediff(seconds, start_date, end_date) as decimal)/3600,2)

Upvotes: 1

Views: 2507

Answers (2)

Badri Prasad Rath
Badri Prasad Rath

Reputation: 61

here is how I have done the calc and it works perfectly

select user, trunc(start_time) as date1, 
       SUM(case when id = 1 then round(cast(datediff(seconds, start_time, st_t1) as decimal)/3600,2) end) as SCHEDULE

from
(
select user, start_time,
       case when trunc(start_time) <> trunc(end_time) then cast(to_char(start_time,'yyyy-mm-dd 23:59:59') as timestamp) else cast(to_char(end_time,'yyyy-mm-dd hh24:mi:ss') as timestamp) end as st_t1
from   table1 a
where id = 1
group by user_name, trunc(start_time)

union

select user_name, trunc(end_time) as date1,
       SUM(case when id = 1 then round(cast(datediff(seconds, st_t2, end_time) as decimal)/3600,2) end) as SCHEDULE

from
(
select user_name, end_time, 
       case when trunc(start_time) <> trunc(end_time) then cast(to_char(end_time,'yyyy-mm-dd 00:00:00') as timestamp) else cast(to_char(end_time,'yyyy-mm-dd hh24:mi:ss') as timestamp) end as st_t2
from   table1 a
where  id = 1    
)
group by user, trunc(end_time)

Upvotes: 1

moertel
moertel

Reputation: 1580

Whenever there is sophisticated logic, I'd suggest to use union queries and split the logic into a select query (or even table) each. Then you'd be able to calculate this in two steps. The main difference seems to be whether the 0.5 between 00:00:00 and 00:30:00 should be counted to the previous workday or whether it should stand alone. The latter seems to be determined based on whether the end_date is also a workday itself. I see three cases:

  • Next day is a workday:
    1. Report all hours on start_date
  • Next day is not a workday:
    1. Report hours from start_date to midnight on start_date
    2. Report hours from midnight to end_date on end_date

I used the following example data based on your description:

create temporary table _test (user varchar(20), start_date timestamp, end_date timestamp);
insert into _test values ('Alex', '7/25/2016 16:00', '7/26/2016 0:30'), ('Alex', '7/24/2016 16:00', '7/25/2016 0:30'), ('Alex', '7/21/2016 16:00', '7/22/2016 0:30'), ('Alex', '7/20/2016 16:00', '7/21/2016 0:30'), ('Alex', '7/19/2016 16:00', '7/20/2016 0:30'), ('Alex', '7/18/2016 16:00', '7/19/2016 0:30'), ('Alex', '7/17/2016 16:00', '7/18/2016 0:30'), ('Alex', '7/14/2016 16:00', '7/15/2016 0:30'), ('Alex', '7/13/2016 16:00', '7/14/2016 0:30'), ('Alex', '7/12/2016 16:00', '7/13/2016 0:30'), ('Alex', '7/11/2016 16:00', '7/12/2016 0:30'), ('Alex', '7/10/2016 16:00', '7/11/2016 0:30');

We will need to know whether the next day is a workday, so I suggest using the lead() window function (see documentation) which will give you the start_date from the next row.

create temporary table _differences as (
    select
        user_name
      , start_date::date as start_date
      , end_date::date as end_date
       /** 
        * Calculate difference in hours between start_date and end_date: */
      , round(cast(datediff(seconds, start_date, end_date) as decimal)/3600,2) as hours_start_to_end
       /** 
        * Calculate difference in hours between start_date and midnight: */
      , round(cast(datediff(seconds, start_date, dateadd(day, 1, start_date::date)) as decimal)/3600,2) as hours_start_to_midnight
       /**
        * Calculate difference between midnight on end_date and end_date: */
      , round(cast(datediff(seconds, end_date::date, end_date) as decimal)/3600,2) as hours_midnight_to_end
       /**
        * Calculate number of days from end_date until next start_date: */
      , datediff(day, end_date::date, lead(start_date::date) over(partition by user_name order by start_date::date)) as days_until_next_workday
    from
        _test
);

Then the following query:

    select
        user_name          as user_name
      , start_date         as ref_date
      , hours_start_to_end as difference
    from
        _differences
    where
        days_until_next_workday = 0 -- report all work hours on start_date
union
    select
        user_name               as user_name
      , start_date              as ref_date
      , hours_start_to_midnight as difference
    from
        _differences
    where
        days_until_next_workday > 0 -- report partial work hours on start_date
union
    select
        user_name             as user_name
      , end_date              as ref_date
      , hours_midnight_to_end as difference
    from
        _differences
    where
        days_until_next_workday > 0 -- report partial work hours on end_date
order by
    user_name
  , ref_date desc
;

Would yield the following result:

 user_name |  ref_date  | difference
-----------+------------+------------
 Alex      | 2016-07-24 |       8.50
 Alex      | 2016-07-22 |       0.50
 Alex      | 2016-07-21 |       8.00
 Alex      | 2016-07-20 |       8.50
 Alex      | 2016-07-19 |       8.50
 Alex      | 2016-07-18 |       8.50
 Alex      | 2016-07-17 |       8.50
 Alex      | 2016-07-15 |       0.50
 Alex      | 2016-07-14 |       8.00
 Alex      | 2016-07-13 |       8.50
 Alex      | 2016-07-12 |       8.50
 Alex      | 2016-07-11 |       8.50
 Alex      | 2016-07-10 |       8.50
(13 rows)

You can see that 7/25/2016 is missing because there is no start_date on or after 7/26/2016, so you'll need to figure out how to account for that special case.

Upvotes: 1

Related Questions