ErikL
ErikL

Reputation: 2041

mysql: count records in a subtable, grouped by dates

I have 2 tables, a table with tasks, and a table "Tasklog" where the progress of a task is stored

Tasks
ID Description 
1  task1
2  task2
3  task3
4  task4
5  task5

Tasklog

ID taskId dtTime        Status
1   1       2016-01-01  new
2   1       2016-02-10  in progress
3   1       2016-03-03  closed
4   2       2016-01-01  new
5   2       2016-01-10  in progress
6   2       2016-01-11  closed
7   3       2016-01-01  new 
8   4       2016-01-01  new
9   5       2016-01-01  new
10  5       2016-01-01  in progress

One task can have multiple tasklog records Now I want to create a report to show how many tasks where "open" in per month (open means that they have no tasklog record with status "closed" in that month) in the last 24 months.

so I need something like this:

2016-01 4
2016-02 4
2016-03 3

I believe these should be the actual numbers the query should result in.

I'm struggling to write a proper query to do this. I think I'm close, but with the query below I'm counting tasklog records, instead I would like to count the task records, regardless of how many tasklog records they have.

select month(dtTime) month, year(dtTime) year, count(t.id) as number
    from tasklog tl
    join tasks t on t.id = taskId
    where dtTime > DATE_ADD(now(), INTERVAL -2 YEAR)
    and t.id not in ( 
        select id from tasklog tl1 
        where status in ('closed')
        and month(tl1.dtTime) = month(tl.dtTime) 
        and year(tl1.dtTime) = year(tl.dtTime)
    )
    group by month(dtTime), year(dtTime)
    order by dtTime;

Any suggestions how I could best do this?

code for creating / filling the tables:

CREATE TABLE tasks
    (`id` int, `description` varchar(5))
;

INSERT INTO tasks
    (`id`, `description`)
VALUES
    (1, 'task1'),
    (2, 'task2'),
    (3, 'task3'),
    (4, 'task4'),
    (5, 'task5')
;


CREATE TABLE tasklog
    (`ID` int, `taskId` int, `dtTime` datetime, `Status` varchar(11))
;

INSERT INTO tasklog
    (`ID`, `taskId`, `dtTime`, `Status`)
VALUES
    (1, 1, '2016-01-01 00:00:00', 'new'),
    (2, 1, '2016-02-10 00:00:00', 'in progress'),
    (3, 1, '2016-03-03 00:00:00', 'closed'),
    (4, 2, '2016-01-01 00:00:00', 'new'),
    (5, 2, '2016-01-10 00:00:00', 'in progress'),
    (6, 2, '2016-01-11 00:00:00', 'closed'),
    (7, 3, '2016-01-01 00:00:00', 'new'),
    (8, 4, '2016-01-01 00:00:00', 'new'),
    (9, 5, '2016-01-01 00:00:00', 'new'),
    (10, 5, '2016-01-01 00:00:00', 'in progress')
;

Update in response to Stefano Zanini

It seems both solutions do not get me the proper results. The right result would be this I believe:

2016-01 4
2016-02 4
2016-03 3

your query only gives me:

1   2016    4

If I add the distinct to my query, I get:

1   2016    5
2   2016    1
3   2016    1 

Upvotes: 1

Views: 341

Answers (2)

ErikL
ErikL

Reputation: 2041

This is what I ended up with:

select  distinct mm.year as year, 
                 mm.month as month, 
                 count(distinct tl.taskId) number
from    (select distinct month(dtTime) month , year(dttime) year  from tasklog) mm
join    tasklog tl
 on     mm.month >= month(tl.dtTime) and mm.year >= year(tl.dtTime) and tl.Status !='closed'
left join
        tasklog tl2
 on     mm.month >= month(tl2.dtTime) and mm.year >= year(tl2.dtTime) and tl2.Status = 'closed' and tl.taskId = tl2.taskId
where   tl2.taskId is null
group by mm.month
order by 1;

Upvotes: 0

Stefano Zanini
Stefano Zanini

Reputation: 5926

I would go with a left join

select month(tl.dtTime) month, year(tl.dtTime) year, count(t.id) as number
    from tasks t
    join tasklog tl
    on t.id = tl.taskId and tl.status = 'new'
    left join tasklog tl2
    on t.id = tl2.taskId and month(tl.dtTime) = month(tl2.dtTime) and tl2.status = 'closed'
    where dtTime > DATE_ADD(now(), INTERVAL -2 YEAR)
    and tl2.taskId is null
    group by month(tl.dtTime), year(tl.dtTime)
    order by dtTime;

edit

...but you may only need to add a distinct clause in your count

count(distinct t.id)

edit after further explaination

I misunderstood the requirement. This will do the trick:

select  distinct mm.month, count(distinct tl.taskId)
from    (select distinct month(dtTime) mm from tasklog) mm
join    tasklog tl
 on     mm.montId >= month(tl.dtTime) and tl.Status = 'new'
left join
        tasklog tl2
 on     mm.month >= month(tl2.dtTime) and tl2.Status = 'closed' and tl.taskId = tl2.taskId
where   tl2.taskId is null
group by mm.month
order by 1;

The first join joins each month with all the tasks opened up to that month. With the second join, the left one, you give a value to each month / taskId if the task was closed in that month or in a earlier one. From here you can filter only the open tasks for each month filtering the rows that don't have this value.

You'll have to add the year and maybe refine the join clauses, but with the sample data you provided this query is perfectly fine.

Upvotes: 1

Related Questions