qcha
qcha

Reputation: 583

SQL query to track the presence of distinct values in time

Using PostgreSQL, I would like to track the presence of each distinct id's from one day to the next one in the following table. How many were added / removed since the previous date ? How many were there at both dates ?

Date Id
2021-06-28 1
2021-06-28 2
2021-06-28 3
2021-06-29 3
2021-06-29 4
2021-06-29 5
2021-06-30 4
2021-06-30 5
2021-06-30 6

I am thus looking for a SQL query that returns this kind of results:

Date Added Idle Removed
2021-06-28 3 0 0
2021-06-29 2 1 2
2021-06-30 1 2 1

Do you have any idea ?

Upvotes: 0

Views: 77

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 246838

First, for each day and id figure out the date when this id last occured. The number of retained ids is the count of rows per day where the id occurred the previous day. New are all values that were not retained, and removed were the rows from the previous day that were not retained.

In SQL:

SELECT d,
       total - retained AS added,
       retained AS idle,
       lag(total, 1, 0::bigint) OVER (ORDER BY d) - retained AS removed
FROM (SELECT d,
             count(prev_d) FILTER (WHERE d - 1 = prev_d) AS retained,
             count(*) AS total
      FROM (SELECT d,
                   lag(d, 1, '-infinity') OVER (PARTITION BY id ORDER BY d) AS prev_d
            FROM mytable) AS q
      GROUP BY d) AS p;

     d      │ added │ idle │ removed 
════════════╪═══════╪══════╪═════════
 2021-06-28 │     3 │    0 │       0
 2021-06-29 │     2 │    1 │       2
 2021-06-30 │     1 │    2 │       1
(3 rows)

Upvotes: 2

Related Questions