New event count on each day

Question

I have a table Events which has 2 columns dt (Date) and event. I am trying to get the new event count for each day. For e.g. if the event appeared today and tomorrow, then it should only be counted for today (as it is show for the first time) and should be excluded from tomorrow's count.

+------------+-------+
| dt         | event |
+------------+-------+
| 2019-02-01 | u1    |
| 2019-02-01 | u2    |
| 2019-02-01 | u3    |
| 2019-02-01 | u4    |
| 2019-02-01 | u1    |
| 2019-02-02 | u5    |
| 2019-02-02 | u1    |
| 2019-02-02 | u6    |
| 2019-02-02 | u4    |
| 2019-02-03 | u8    |
| 2019-02-03 | u9    |
| 2019-02-03 | u6    |
+------------+-------+

I'm trying to get the below output:

+------------+-------------------------+
| dt         | count(distinct c.event) |
+------------+-------------------------+
| 2019-02-01 |                       4 |
| 2019-02-02 |                       2 |
| 2019-02-03 |                       2 |
+------------+-------------------------+

The query I am using is:

with cte (dt,event,rnk) as
(
select dt, 
event,
row_number() over (partition by dt order by event) as rnk
from events order by event
)
select e.dt, count(distinct c.event) from events e left join cte c
on  e.event <> c.event where e.dt = c.dt
group by 1;

But I am not able to filter out the already appeared event from the previous day. I believe something small is missing. The result from my query is:

+------------+-------------------------+
| dt         | count(distinct c.event) |
+------------+-------------------------+
| 2019-02-01 |                       4 |
| 2019-02-02 |                       4 |
| 2019-02-03 |                       3 |
+------------+-------------------------+

I am not sure if LEAD / LAG function can help solve the issue. Also is there a way that I could achieve this without using any joins.

Table creation and insertion queries:

create table events (dt Date, event varchar(5));

insert into events values('2019-02-01', 'u1');
insert into events values('2019-02-01',  'u2');
insert into events values('2019-02-01',  'u3');
insert into events values('2019-02-01',  'u4');
insert into events values('2019-02-01',  'u1');
insert into events values('2019-02-02',  'u5');
insert into events values('2019-02-02',  'u1');
insert into events values('2019-02-02',  'u6');
insert into events values('2019-02-02',  'u4');
insert into events values('2019-02-03',  'u8');
insert into events values('2019-02-03',  'u9');
insert into events values('2019-02-03',  'u6');

Juan Carlos Oropeza · Accepted Answer

You can use row_number to find when is the first day the event appear.

SQL DEMO

WITH ranks as (
    SELECT *, row_number() over (partition by event order by dt) as rn
    FROM events
) 
SELECT dt, COUNT(event)
FROM ranks
WHERE rn = 1 
GROUP BY dt

OUTPUT

|         dt | count |
|------------|-------|
| 2019-02-01 |     4 |
| 2019-02-02 |     2 |
| 2019-02-03 |     2 |

New event count on each day

Answers (2)

Related Questions