alwaysaskingquestions
alwaysaskingquestions

Reputation: 1657

MySQL: why is the date difference always 0?

I am practicing my SQL skills, and here's a problem I'm practicing on: https://www.hackerrank.com/challenges/projects

Here's the problem:

You are given a table, Projects, containing three columns: Task_ID, Start_Date and End_Date. It is guaranteed that the difference between the End_Date and the Start_Date is equal to 1 day for each row in the table. If the End_Date of the tasks are consecutive, then they are part of the same project. Samantha is interested in finding the total number of different projects completed.

Write a query to output the start and end dates of projects listed by the number of days it took to complete the project in ascending order. If there is more than one project that have the same number of completion days, then order by the start date of the project.

My logic is that, I'll order everything by start date, and then assign rankings to each row. If the current end date is consecutive from previous end date, aka datediff(end_date, @e) = 1, then I'll keep the same ranking number, else I'll increment the ranking by 1. So then basically all the rows belonging to the same project will have the same ranking.

Below is my code:

set @r := 0;
select min(start_date), max(end_date)
from (select start_date, end_date, 
      @r := if(datediff(end_date, @e) = 1, @r, @r+1) ranking,
      @e := end_date
      from projects
      order by start_date
     ) b
group by ranking
order by datediff(max(end_date), min(start_date)), 1
;

But this code is not working.

Update

After making changes according to Sasha Pachev's answer, when I tried to check the subquery:

set @r = 0;
select start_date, end_date, 
@e := end_date, datediff(end_date, date(@e)),
@r := if(datediff(end_date, date(@e)), @r, @r+1) ranking
from projects
order by start_date
;

I noticed that all my date diff are 0:

2015-10-01 2015-10-02 2015-10-02 0 1 
2015-10-02 2015-10-03 2015-10-03 0 2 
2015-10-03 2015-10-04 2015-10-04 0 3 
2015-10-04 2015-10-05 2015-10-05 0 4 
2015-10-11 2015-10-12 2015-10-12 0 5 
2015-10-12 2015-10-13 2015-10-13 0 6 
2015-10-15 2015-10-16 2015-10-16 0 7 
2015-10-17 2015-10-18 2015-10-18 0 8 
2015-10-19 2015-10-20 2015-10-20 0 9 

And I understand why, because I assigned the @e as the end date before assigning ranking @r. But if I assign @e after @r, it doesnt work since I need @e to have a value for the diffdate() function. Can someone help me break this cycle?

Upvotes: 1

Views: 2160

Answers (10)

Raj Kumar
Raj Kumar

Reputation: 1

select start_Date,end_date from (
select start_Date,end_date,datediff(day,start_Date,end_date)as diff1 from 
(
select start_Date,row_number() over (order by start_Date) as rn from
(select start_date from projects
where start_date not in (select distinct end_date from projects)
) t1)t1a

left join 
(select end_date,row_number() over (order by end_date) as rn from
(select end_date from projects
where end_date not in (select distinct start_date from projects)
) t2)t2a

on t1a.rn=t2a.rn )tx
order by diff1 , start_date

Upvotes: 0

Syed Ghufran Hassan
Syed Ghufran Hassan

Reputation: 3

select Min_Start_Date,Max_End_Date

from (

select distinct
 
       min(start_date) over (partition by Ref_Start_date) as Min_Start_Date,

       max(End_Date) over (Partition by Ref_End_Date) as Max_End_Date,

       count(*) over (partition by Ref_Start_date,Ref_End_Date) as CNT

from (

        select 

        start_date,

        start_date+dense_rank() over (order by start_date desc) as Ref_Start_date, 

        end_date,

        End_date+dense_rank() over (order by End_date desc) as Ref_End_date

        from projects

    )

 order by 3,1
);

Upvotes: 0

Somnath Ray
Somnath Ray

Reputation: 11

SELECT         START_DATE, MIN(END_DATE) AS [END_DATE]
    FROM       (
                    (SELECT         START_DATE 
                            FROM    PROJECTS 
                            WHERE   START_DATE NOT IN (SELECT END_DATE FROM PROJECTS)
                    ) A
                    CROSS JOIN
                    (SELECT        END_DATE 
                            FROM   PROJECTS 
                            WHERE  END_DATE NOT IN (SELECT START_DATE FROM PROJECTS)
                    ) B
               )     
    WHERE      START_DATE < END_DATE
    GROUP BY   START_DATE
    ORDER BY   DATEDIFF(DAY, MIN(END_DATE), START_DATE) DESC, CAST(START_DATE AS DATE) ASC

Upvotes: 1

Milad Firouzi
Milad Firouzi

Reputation: 1

I wrote it with a help of while loop and using temp table

select * , row_number() over (order by start_date) rn into #temp 
from projects
order by start_date

declare @counter int = 2
declare @e_date date = (select min(end_date) from #temp)

while @counter <= (select max(rn) from #temp)
begin
    if @e_date = (select start_date from #temp where rn = @counter)
    begin
        update #temp
        set start_date = (select start_date from #temp where rn = @counter -1 )
        where rn = @counter

        delete from #temp
        where rn = @counter - 1

        set @e_date = (select end_date from #temp where rn = @counter)
    end
else
    begin
        set @e_date = (select end_date from #temp where rn = @counter)
    end

     set @counter = @counter + 1
end


select start_date , end_date from #temp
order by datediff(day , start_date , end_date) , start_date

Upvotes: 0

sudarshan vp
sudarshan vp

Reputation: 29

/* Below is my Solution */


select Min_Start_Date,Max_End_Date
from (
select distinct 
       min(start_date) over (partition by Ref_Start_date) as Min_Start_Date,
       max(End_Date) over (Partition by Ref_End_Date) as Max_End_Date,
       count(*) over (partition by Ref_Start_date,Ref_End_Date) as CNT
from (
        select 
        start_date,
        start_date+dense_rank() over (order by start_date desc) as Ref_Start_date, 
        end_date,
        End_date+dense_rank() over (order by End_date desc) as Ref_End_date
        from projects
    )
 order by 3,1
);

Upvotes: 0

SELECT Start_Date, min(End_Date)
FROM 
 (SELECT Start_Date FROM Projects WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) a ,
 (SELECT End_Date FROM Projects WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date < End_Date
GROUP BY Start_Date
ORDER BY DATEDIFF(min(End_Date), Start_Date) ASC, Start_Date ASC;

Upvotes: 0

Rahul Guntha
Rahul Guntha

Reputation: 1

This will work

SELECT SD,ED
FROM
(SELECT MIN(START_DATE) SD, MAX(END_DATE) ED, ( MAX(END_DATE)-MIN(START_DATE)) dd
FROM
(
SELECT  START_DATE, END_DATE, END_DATE - ROW_NUMBER() OVER (ORDER BY END_DATE) DIST
FROM PROJECTS
 ORDER BY 1)
GROUP BY DIST
 ORDER BY 3,1);

Upvotes: 0

Bernd Buffen
Bernd Buffen

Reputation: 15057

You can do it with a query like this. you can do the init of the vars direct in the query:

SELECT min(start_date), max(end_date)
FROM (select start_date, end_date, 
      @r := if(datediff(end_date, @e) = 1, @r, @r+1) ranking,
      @e := end_date as ee
      FROM projects
      CROSS JOIN ( SELECT @r:=0, @e:='1970-01-01') as init
      ORDER BY start_date
     ) b
GROUP BY ranking
ORDER BY datediff(max(end_date), min(start_date));

sample output

2015-10-15 2015-10-16 
2015-10-17 2015-10-18 
2015-10-19 2015-10-20 
2015-10-21 2015-10-22 
2015-11-01 2015-11-02 
2015-11-17 2015-11-18 
2015-10-11 2015-10-13 
2015-11-11 2015-11-13 
2015-10-01 2015-10-05 
2015-11-04 2015-11-08 
2015-10-25 2015-10-31 

Upvotes: 0

alwaysaskingquestions
alwaysaskingquestions

Reputation: 1657

Thanks Sasha Pachev for some great suggestions!

Articulating my problems here definitely helped me to think more clearly. I've been stuck on this problem for two days, and now I just reached an epiphany...

I finally understood the root causes of my problem, and have figured out a fix. My biggest problem was that I did not set @e initially, so then when I assign @e before @r, the date difference is always going to be 0! (of course, because it's just subtracting itself). and if I assign @e after @r then it's always NULL.

Below is my code that is working and correct, hopefully this can help someone as well:

set @r = 0;
set @e = (select max(end_date) from projects);
select min(start_date), max(end_date)
from (
    select start_date, end_date, 
    @r := if(datediff(end_date, date(@e)) = 1, @r, @r+1) ranking,
    @e := end_date
    from projects
    order by start_date
    ) b
group by ranking
order by datediff(max(end_date), min(start_date)), 1
;

Upvotes: 0

Sasha Pachev
Sasha Pachev

Reputation: 5336

The order of assignment/use of user variables matters. Try:

select start_date, end_date,@e := end_date, datediff(end_date, date(@e)),
      @r := if(end_date-@e = 1, @r, @r+1) ranking,

      from projects
      order by start_date
;

assigning to @e first, and then using it in datediff()

Upvotes: 0

Related Questions