Reputation: 667
I have a table Events which has 2 columns dt
(Date) and event
. I am trying to get the new event count for each day. For e.g. if the event appeared today and tomorrow, then it should only be counted for today (as it is show for the first time) and should be excluded from tomorrow's count.
+------------+-------+
| dt | event |
+------------+-------+
| 2019-02-01 | u1 |
| 2019-02-01 | u2 |
| 2019-02-01 | u3 |
| 2019-02-01 | u4 |
| 2019-02-01 | u1 |
| 2019-02-02 | u5 |
| 2019-02-02 | u1 |
| 2019-02-02 | u6 |
| 2019-02-02 | u4 |
| 2019-02-03 | u8 |
| 2019-02-03 | u9 |
| 2019-02-03 | u6 |
+------------+-------+
I'm trying to get the below output:
+------------+-------------------------+
| dt | count(distinct c.event) |
+------------+-------------------------+
| 2019-02-01 | 4 |
| 2019-02-02 | 2 |
| 2019-02-03 | 2 |
+------------+-------------------------+
The query I am using is:
with cte (dt,event,rnk) as
(
select dt,
event,
row_number() over (partition by dt order by event) as rnk
from events order by event
)
select e.dt, count(distinct c.event) from events e left join cte c
on e.event <> c.event where e.dt = c.dt
group by 1;
But I am not able to filter out the already appeared event from the previous day. I believe something small is missing. The result from my query is:
+------------+-------------------------+
| dt | count(distinct c.event) |
+------------+-------------------------+
| 2019-02-01 | 4 |
| 2019-02-02 | 4 |
| 2019-02-03 | 3 |
+------------+-------------------------+
I am not sure if LEAD / LAG function can help solve the issue. Also is there a way that I could achieve this without using any joins.
Table creation and insertion queries:
create table events (dt Date, event varchar(5));
insert into events values('2019-02-01', 'u1');
insert into events values('2019-02-01', 'u2');
insert into events values('2019-02-01', 'u3');
insert into events values('2019-02-01', 'u4');
insert into events values('2019-02-01', 'u1');
insert into events values('2019-02-02', 'u5');
insert into events values('2019-02-02', 'u1');
insert into events values('2019-02-02', 'u6');
insert into events values('2019-02-02', 'u4');
insert into events values('2019-02-03', 'u8');
insert into events values('2019-02-03', 'u9');
insert into events values('2019-02-03', 'u6');
Upvotes: 1
Views: 108
Reputation: 33945
Here's a solution for older versions: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=12ba0196cf90994d3c9d853ac62405fe
CREATE TABLE events
(`dt` varchar(10), `event` varchar(2))
INSERT INTO events
(`dt`, `event`)
VALUES
('2019-02-01', 'u1'),
('2019-02-01', 'u2'),
('2019-02-01', 'u3'),
('2019-02-01', 'u4'),
('2019-02-01', 'u1'),
('2019-02-02', 'u5'),
('2019-02-02', 'u1'),
('2019-02-02', 'u6'),
('2019-02-02', 'u4'),
('2019-02-03', 'u8'),
('2019-02-03', 'u9'),
('2019-02-03', 'u6')
SELECT dt, COUNT(*) total FROM
(SELECT event, MIN(dt) dt
FROM events
GROUP BY event) x
GROUP BY dt
Actually, the subquery is all you need; the rest can be handled in application code.
Upvotes: 2
Reputation: 48197
You can use row_number
to find when is the first day the event appear.
WITH ranks as (
SELECT *, row_number() over (partition by event order by dt) as rn
FROM events
)
SELECT dt, COUNT(event)
FROM ranks
WHERE rn = 1
GROUP BY dt
OUTPUT
| dt | count |
|------------|-------|
| 2019-02-01 | 4 |
| 2019-02-02 | 2 |
| 2019-02-03 | 2 |
Upvotes: 2