Reputation: 1657
I am practicing my SQL skills, and here's a problem I'm practicing on: https://www.hackerrank.com/challenges/projects
Here's the problem:
You are given a table, Projects, containing three columns: Task_ID, Start_Date and End_Date. It is guaranteed that the difference between the End_Date and the Start_Date is equal to 1 day for each row in the table. If the End_Date of the tasks are consecutive, then they are part of the same project. Samantha is interested in finding the total number of different projects completed.
Write a query to output the start and end dates of projects listed by the number of days it took to complete the project in ascending order. If there is more than one project that have the same number of completion days, then order by the start date of the project.
My logic is that, I'll order everything by start date, and then assign rankings to each row. If the current end date is consecutive from previous end date, aka datediff(end_date, @e) = 1, then I'll keep the same ranking number, else I'll increment the ranking by 1. So then basically all the rows belonging to the same project will have the same ranking.
Below is my code:
set @r := 0;
select min(start_date), max(end_date)
from (select start_date, end_date,
@r := if(datediff(end_date, @e) = 1, @r, @r+1) ranking,
@e := end_date
from projects
order by start_date
) b
group by ranking
order by datediff(max(end_date), min(start_date)), 1
;
But this code is not working.
Update
After making changes according to Sasha Pachev's answer, when I tried to check the subquery:
set @r = 0;
select start_date, end_date,
@e := end_date, datediff(end_date, date(@e)),
@r := if(datediff(end_date, date(@e)), @r, @r+1) ranking
from projects
order by start_date
;
I noticed that all my date diff are 0:
2015-10-01 2015-10-02 2015-10-02 0 1
2015-10-02 2015-10-03 2015-10-03 0 2
2015-10-03 2015-10-04 2015-10-04 0 3
2015-10-04 2015-10-05 2015-10-05 0 4
2015-10-11 2015-10-12 2015-10-12 0 5
2015-10-12 2015-10-13 2015-10-13 0 6
2015-10-15 2015-10-16 2015-10-16 0 7
2015-10-17 2015-10-18 2015-10-18 0 8
2015-10-19 2015-10-20 2015-10-20 0 9
And I understand why, because I assigned the @e as the end date before assigning ranking @r. But if I assign @e after @r, it doesnt work since I need @e to have a value for the diffdate() function. Can someone help me break this cycle?
Upvotes: 1
Views: 2160
Reputation: 1
select start_Date,end_date from (
select start_Date,end_date,datediff(day,start_Date,end_date)as diff1 from
(
select start_Date,row_number() over (order by start_Date) as rn from
(select start_date from projects
where start_date not in (select distinct end_date from projects)
) t1)t1a
left join
(select end_date,row_number() over (order by end_date) as rn from
(select end_date from projects
where end_date not in (select distinct start_date from projects)
) t2)t2a
on t1a.rn=t2a.rn )tx
order by diff1 , start_date
Upvotes: 0
Reputation: 3
select Min_Start_Date,Max_End_Date
from (
select distinct
min(start_date) over (partition by Ref_Start_date) as Min_Start_Date,
max(End_Date) over (Partition by Ref_End_Date) as Max_End_Date,
count(*) over (partition by Ref_Start_date,Ref_End_Date) as CNT
from (
select
start_date,
start_date+dense_rank() over (order by start_date desc) as Ref_Start_date,
end_date,
End_date+dense_rank() over (order by End_date desc) as Ref_End_date
from projects
)
order by 3,1
);
Upvotes: 0
Reputation: 11
SELECT START_DATE, MIN(END_DATE) AS [END_DATE]
FROM (
(SELECT START_DATE
FROM PROJECTS
WHERE START_DATE NOT IN (SELECT END_DATE FROM PROJECTS)
) A
CROSS JOIN
(SELECT END_DATE
FROM PROJECTS
WHERE END_DATE NOT IN (SELECT START_DATE FROM PROJECTS)
) B
)
WHERE START_DATE < END_DATE
GROUP BY START_DATE
ORDER BY DATEDIFF(DAY, MIN(END_DATE), START_DATE) DESC, CAST(START_DATE AS DATE) ASC
Upvotes: 1
Reputation: 1
I wrote it with a help of while loop and using temp table
select * , row_number() over (order by start_date) rn into #temp
from projects
order by start_date
declare @counter int = 2
declare @e_date date = (select min(end_date) from #temp)
while @counter <= (select max(rn) from #temp)
begin
if @e_date = (select start_date from #temp where rn = @counter)
begin
update #temp
set start_date = (select start_date from #temp where rn = @counter -1 )
where rn = @counter
delete from #temp
where rn = @counter - 1
set @e_date = (select end_date from #temp where rn = @counter)
end
else
begin
set @e_date = (select end_date from #temp where rn = @counter)
end
set @counter = @counter + 1
end
select start_date , end_date from #temp
order by datediff(day , start_date , end_date) , start_date
Upvotes: 0
Reputation: 29
/* Below is my Solution */
select Min_Start_Date,Max_End_Date
from (
select distinct
min(start_date) over (partition by Ref_Start_date) as Min_Start_Date,
max(End_Date) over (Partition by Ref_End_Date) as Max_End_Date,
count(*) over (partition by Ref_Start_date,Ref_End_Date) as CNT
from (
select
start_date,
start_date+dense_rank() over (order by start_date desc) as Ref_Start_date,
end_date,
End_date+dense_rank() over (order by End_date desc) as Ref_End_date
from projects
)
order by 3,1
);
Upvotes: 0
Reputation: 1
SELECT Start_Date, min(End_Date)
FROM
(SELECT Start_Date FROM Projects WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) a ,
(SELECT End_Date FROM Projects WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date < End_Date
GROUP BY Start_Date
ORDER BY DATEDIFF(min(End_Date), Start_Date) ASC, Start_Date ASC;
Upvotes: 0
Reputation: 1
This will work
SELECT SD,ED
FROM
(SELECT MIN(START_DATE) SD, MAX(END_DATE) ED, ( MAX(END_DATE)-MIN(START_DATE)) dd
FROM
(
SELECT START_DATE, END_DATE, END_DATE - ROW_NUMBER() OVER (ORDER BY END_DATE) DIST
FROM PROJECTS
ORDER BY 1)
GROUP BY DIST
ORDER BY 3,1);
Upvotes: 0
Reputation: 15057
You can do it with a query like this. you can do the init of the vars direct in the query:
SELECT min(start_date), max(end_date)
FROM (select start_date, end_date,
@r := if(datediff(end_date, @e) = 1, @r, @r+1) ranking,
@e := end_date as ee
FROM projects
CROSS JOIN ( SELECT @r:=0, @e:='1970-01-01') as init
ORDER BY start_date
) b
GROUP BY ranking
ORDER BY datediff(max(end_date), min(start_date));
sample output
2015-10-15 2015-10-16
2015-10-17 2015-10-18
2015-10-19 2015-10-20
2015-10-21 2015-10-22
2015-11-01 2015-11-02
2015-11-17 2015-11-18
2015-10-11 2015-10-13
2015-11-11 2015-11-13
2015-10-01 2015-10-05
2015-11-04 2015-11-08
2015-10-25 2015-10-31
Upvotes: 0
Reputation: 1657
Thanks Sasha Pachev for some great suggestions!
Articulating my problems here definitely helped me to think more clearly. I've been stuck on this problem for two days, and now I just reached an epiphany...
I finally understood the root causes of my problem, and have figured out a fix. My biggest problem was that I did not set @e initially, so then when I assign @e before @r, the date difference is always going to be 0! (of course, because it's just subtracting itself). and if I assign @e after @r then it's always NULL.
Below is my code that is working and correct, hopefully this can help someone as well:
set @r = 0;
set @e = (select max(end_date) from projects);
select min(start_date), max(end_date)
from (
select start_date, end_date,
@r := if(datediff(end_date, date(@e)) = 1, @r, @r+1) ranking,
@e := end_date
from projects
order by start_date
) b
group by ranking
order by datediff(max(end_date), min(start_date)), 1
;
Upvotes: 0
Reputation: 5336
The order of assignment/use of user variables matters. Try:
select start_date, end_date,@e := end_date, datediff(end_date, date(@e)),
@r := if(end_date-@e = 1, @r, @r+1) ranking,
from projects
order by start_date
;
assigning to @e
first, and then using it in datediff()
Upvotes: 0