Ravi
Ravi

Reputation: 667

New event count on each day

I have a table Events which has 2 columns dt (Date) and event. I am trying to get the new event count for each day. For e.g. if the event appeared today and tomorrow, then it should only be counted for today (as it is show for the first time) and should be excluded from tomorrow's count.

+------------+-------+
| dt         | event |
+------------+-------+
| 2019-02-01 | u1    |
| 2019-02-01 | u2    |
| 2019-02-01 | u3    |
| 2019-02-01 | u4    |
| 2019-02-01 | u1    |
| 2019-02-02 | u5    |
| 2019-02-02 | u1    |
| 2019-02-02 | u6    |
| 2019-02-02 | u4    |
| 2019-02-03 | u8    |
| 2019-02-03 | u9    |
| 2019-02-03 | u6    |
+------------+-------+

I'm trying to get the below output:

+------------+-------------------------+
| dt         | count(distinct c.event) |
+------------+-------------------------+
| 2019-02-01 |                       4 |
| 2019-02-02 |                       2 |
| 2019-02-03 |                       2 |
+------------+-------------------------+

The query I am using is:

with cte (dt,event,rnk) as
(
select dt, 
event,
row_number() over (partition by dt order by event) as rnk
from events order by event
)
select e.dt, count(distinct c.event) from events e left join cte c
on  e.event <> c.event where e.dt = c.dt
group by 1;

But I am not able to filter out the already appeared event from the previous day. I believe something small is missing. The result from my query is:

+------------+-------------------------+
| dt         | count(distinct c.event) |
+------------+-------------------------+
| 2019-02-01 |                       4 |
| 2019-02-02 |                       4 |
| 2019-02-03 |                       3 |
+------------+-------------------------+

I am not sure if LEAD / LAG function can help solve the issue. Also is there a way that I could achieve this without using any joins.

Table creation and insertion queries:

create table events (dt Date, event varchar(5));

insert into events values('2019-02-01', 'u1');
insert into events values('2019-02-01',  'u2');
insert into events values('2019-02-01',  'u3');
insert into events values('2019-02-01',  'u4');
insert into events values('2019-02-01',  'u1');
insert into events values('2019-02-02',  'u5');
insert into events values('2019-02-02',  'u1');
insert into events values('2019-02-02',  'u6');
insert into events values('2019-02-02',  'u4');
insert into events values('2019-02-03',  'u8');
insert into events values('2019-02-03',  'u9');
insert into events values('2019-02-03',  'u6');

Upvotes: 1

Views: 108

Answers (2)

Strawberry
Strawberry

Reputation: 33945

Here's a solution for older versions: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=12ba0196cf90994d3c9d853ac62405fe

CREATE TABLE events
    (`dt` varchar(10), `event` varchar(2))
INSERT INTO events
    (`dt`, `event`)
VALUES
    ('2019-02-01', 'u1'),
    ('2019-02-01', 'u2'),
    ('2019-02-01', 'u3'),
    ('2019-02-01', 'u4'),
    ('2019-02-01', 'u1'),
    ('2019-02-02', 'u5'),
    ('2019-02-02', 'u1'),
    ('2019-02-02', 'u6'),
    ('2019-02-02', 'u4'),
    ('2019-02-03', 'u8'),
    ('2019-02-03', 'u9'),
    ('2019-02-03', 'u6')
SELECT dt, COUNT(*) total FROM
(SELECT event, MIN(dt) dt
      FROM events
      GROUP BY event) x
GROUP BY dt

Actually, the subquery is all you need; the rest can be handled in application code.

Upvotes: 2

Juan Carlos Oropeza
Juan Carlos Oropeza

Reputation: 48197

You can use row_number to find when is the first day the event appear.

SQL DEMO

WITH ranks as (
    SELECT *, row_number() over (partition by event order by dt) as rn
    FROM events
) 
SELECT dt, COUNT(event)
FROM ranks
WHERE rn = 1 
GROUP BY dt

OUTPUT

|         dt | count |
|------------|-------|
| 2019-02-01 |     4 |
| 2019-02-02 |     2 |
| 2019-02-03 |     2 |

Upvotes: 2

Related Questions