Raphael
Raphael

Reputation: 112

Dealing with Dates in MySQL. Total number of different projects completed

I have a table in an old version of MySQL 5.x like this:

+---------+------------+------------+
| Task_ID | Start_Date | End_Date   |
+---------+------------+------------+
|       1 | 2015-10-15 | 2015-10-16 |
|       2 | 2015-10-17 | 2015-10-18 |
|       3 | 2015-10-19 | 2015-10-20 |
|       4 | 2015-10-21 | 2015-10-22 |
|       5 | 2015-11-01 | 2015-11-02 |
|       6 | 2015-11-17 | 2015-11-18 |
|       7 | 2015-10-11 | 2015-10-12 |
|       8 | 2015-10-12 | 2015-10-13 |
|       9 | 2015-11-11 | 2015-11-12 |
|      10 | 2015-11-12 | 2015-11-13 |
|      11 | 2015-10-01 | 2015-10-02 |
|      12 | 2015-10-02 | 2015-10-03 |
|      13 | 2015-10-03 | 2015-10-04 |
|      14 | 2015-10-04 | 2015-10-05 |
|      15 | 2015-11-04 | 2015-11-05 |
|      16 | 2015-11-05 | 2015-11-06 |
|      17 | 2015-11-06 | 2015-11-07 |
|      18 | 2015-11-07 | 2015-11-08 |
|      19 | 2015-10-25 | 2015-10-26 |
|      20 | 2015-10-26 | 2015-10-27 |
|      21 | 2015-10-27 | 2015-10-28 |
|      22 | 2015-10-28 | 2015-10-29 |
|      23 | 2015-10-29 | 2015-10-30 |
|      24 | 2015-10-30 | 2015-10-31 |
+---------+------------+------------+

If the End_Date of the tasks are consecutive, then they are part of the same project. I am interested in finding the total number of different projects completed.

If there is more than one project that have the same number of completion days, then order by the Start_Date of the project.

For this few sample records the expected output would be:

2015-10-15 2015-10-16
2015-10-17 2015-10-18
2015-10-19 2015-10-20
2015-10-21 2015-10-22
2015-11-01 2015-11-02
2015-11-17 2015-11-18
2015-10-11 2015-10-13
2015-11-11 2015-11-13
2015-10-01 2015-10-05
2015-11-04 2015-11-08
2015-10-25 2015-10-31

I am a bit jammed with this. I would really appreciate any help. Thanks.

Upvotes: 1

Views: 383

Answers (3)

Chema
Chema

Reputation: 2838

It's a little tricky problem, but the query below works fine.

It builds two tables, one with Start_Date and other with End_Date that NOT IN End_Date and Start_Date respectively from Projects table, and query these tables fetching Start_Date WHERE Start_Date < End_Date grouping by Start_Date using aggregate function MIN with End_Date to get a complete Project.

DATEDIFF(MIN(End_Date), Start_Date) to calculate project_duration and able to order by project_duration.

SELECT Start_Date, MIN(End_Date) AS End_Date, DATEDIFF(MIN(End_Date), Start_Date) AS project_duration
FROM
    (SELECT Start_Date FROM Projects WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) a,
    (SELECT End_Date FROM Projects WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date < End_Date
GROUP BY Start_Date
ORDER BY project_duration ASC, Start_Date ASC;

expected output

+------------+------------+---------------+
| Start_Date | End_Date   | project_duration |
+------------+------------+---------------+
| 2015-10-15 | 2015-10-16 |             1 |
| 2015-10-17 | 2015-10-18 |             1 |
| 2015-10-19 | 2015-10-20 |             1 |
| 2015-10-21 | 2015-10-22 |             1 |
| 2015-11-01 | 2015-11-02 |             1 |
| 2015-11-17 | 2015-11-18 |             1 |
| 2015-10-11 | 2015-10-13 |             2 |
| 2015-11-11 | 2015-11-13 |             2 |
| 2015-10-01 | 2015-10-05 |             4 |
| 2015-11-04 | 2015-11-08 |             4 |
| 2015-10-25 | 2015-10-31 |             6 |
+------------+------------+---------------+

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270061

This answers -- and answers correctly -- the original version of this question.

Hmmmm . . . I think you can use variables. The simplest way is to generate a sequential number and then subtract this value to get a constant for adjacent rows from the date:

select min(start_date), max(end_date)
from (select t.*, (@rn := @rn + 1) as rn
      from (select t.* from tasks t order by end_date) t cross join
           (select @rn := 0) params
     ) t
group by (end_date - interval rn day);

Here is a db<>fiddle.

Upvotes: 2

Harshil Doshi
Harshil Doshi

Reputation: 3592

Following query should work:

   select  tmp.projectid, date_sub(max(tmp.ed2), interval max(tmp.projectdays) day) start_date,
       max(tmp.ed2) end_date, 
      max(tmp.projectdays) No_Of_ProjectDays
from
(
select t1.task_id tid1, t1.start_date sd1, t1.end_date ed1,
       t2.task_id tid2, t2.start_date sd2, t2.end_date ed2,
     case when  datediff(t2.start_date, ifnull(t1.start_date,'1000-01-01')) != 1
          then  (@pid := @pid + 1)
          else  (@pid := @pid)
     end as ProjectId,
      case when  datediff(t2.start_date, ifnull(t1.start_date,'1000-01-01')) != 1
          then  (@pdays := 1) 
          else  (@pdays := @pdays + 1)
     end as ProjectDays
     from tasks t1 right join tasks t2
on t2.task_id = t1.task_id + 1
cross join (select @pid :=1, @pdays := 1) vars
) tmp
group by tmp.projectid
order by max(tmp.projectdays), start_date

Please find the Demo here.

EDIT : I have made changes in the query and link according to new data sample. Please have a look.

Upvotes: 2

Related Questions