Reputation: 583
Using PostgreSQL, I would like to track the presence of each distinct id's from one day to the next one in the following table. How many were added / removed since the previous date ? How many were there at both dates ?
Date | Id |
---|---|
2021-06-28 | 1 |
2021-06-28 | 2 |
2021-06-28 | 3 |
2021-06-29 | 3 |
2021-06-29 | 4 |
2021-06-29 | 5 |
2021-06-30 | 4 |
2021-06-30 | 5 |
2021-06-30 | 6 |
I am thus looking for a SQL query that returns this kind of results:
Date | Added | Idle | Removed |
---|---|---|---|
2021-06-28 | 3 | 0 | 0 |
2021-06-29 | 2 | 1 | 2 |
2021-06-30 | 1 | 2 | 1 |
Do you have any idea ?
Upvotes: 0
Views: 77
Reputation: 246838
First, for each day and id
figure out the date when this id
last occured.
The number of retained id
s is the count of rows per day where the id
occurred the previous day.
New are all values that were not retained, and removed were the rows from the previous day that were not retained.
In SQL:
SELECT d,
total - retained AS added,
retained AS idle,
lag(total, 1, 0::bigint) OVER (ORDER BY d) - retained AS removed
FROM (SELECT d,
count(prev_d) FILTER (WHERE d - 1 = prev_d) AS retained,
count(*) AS total
FROM (SELECT d,
lag(d, 1, '-infinity') OVER (PARTITION BY id ORDER BY d) AS prev_d
FROM mytable) AS q
GROUP BY d) AS p;
d │ added │ idle │ removed
════════════╪═══════╪══════╪═════════
2021-06-28 │ 3 │ 0 │ 0
2021-06-29 │ 2 │ 1 │ 2
2021-06-30 │ 1 │ 2 │ 1
(3 rows)
Upvotes: 2