Reputation: 3140
i would like to find/mark every 4th day in a continuous date stream inserted into my table for each user in a given date range
CREATE TABLE mytable (
id INTEGER,
myuser INTEGER,
day DATE NOT NULL,
PRIMARY KEY (id)
);
the problem is, that only 3 continuous days are valid per user, after that, there has to be a one day "break"
id | myuser | day |
-----+--------+------------+
0 | 200 | 2012-01-12 | }
1 | 200 | 2012-01-13 | }--> 3 continuous days
2 | 200 | 2012-01-14 | }
3 | 200 | 2012-01-15 | <-- not ok, user 200 should get warned and delete this
4 | 200 | 2012-01-16 | }
5 | 200 | 2012-01-17 | }--> 3 continuous days
6 | 200 | 2012-01-18 | }
7 | 200 | 2012-01-19 | <-- not ok, user 200 should get warned and delete this
8 | 201 | 2012-01-12 | }
9 | 201 | 2012-01-13 | }--> 3 continuous days
10 | 201 | 2012-01-14 | }
11 | 201 | 2012-01-16 | <-- ok, there is a one day gap here
12 | 201 | 2012-01-17 |
the main goal is to look at a given date range (usually a month) and identify days, which are not allowed. Also i have to take care that the overlapping dates are handled correctly, for example, if i look on a date range from 2012-02-01 to 2012-02-29, 2012-02-01 could be a "break" day if 2012-01-29 to 2012-01-31 is present in that table for the same user.
Upvotes: 4
Views: 312
Reputation: 657972
Much simpler and faster with the window function lag()
:
SELECT myuser
,day
,COALESCE(lag(day, 3) OVER (PARTITION BY myuser ORDER BY day) = (day - 3)
,FALSE) AS break_overdue
FROM mytable
WHERE day BETWEEN ('2012-01-12'::date - 3) AND '2012-01-16'::date;
Result:
myuser | day | break_overdue
--------+------------+---------------
200 | 2012-01-12 | f
200 | 2012-01-13 | f
200 | 2012-01-14 | f
200 | 2012-01-15 | t
200 | 2012-01-16 | t
201 | 2012-01-12 | f
201 | 2012-01-13 | f
201 | 2012-01-14 | f
201 | 2012-01-16 | f
The query marks all days as break_overdue
after three consecutive days. It is unclear whether you want all of them marked after the rule has been broken or just every 4th day.
I include 3 days before the start date (not just two) to determine whether the first day is already in violation of the rule.
The test is simple: if the 3rd row before the current row within the partition equals the current day - 3 then the rule has been broken. I wrap it all in COALESCE
to fold NULL
values to FALSE
for cosmetic reasons only. Guaranteed to work as long as (myuser, day)
is unique.
In PostgreSQL you can subtract integers form a date, effectively subtracting days.
Can be done in a single query level, no CTE or subquery needed. Should be much faster.
You need PostgreSQL 8.4 or later for window functions.
Upvotes: 2
Reputation: 86765
I don't have access to PostgreSQL, but hopefully this works...
WITH
grouped_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY myuser ORDER BY day) - (day - start_date) AS user_group_id,
myuser,
day
FROM
myTable
WHERE
day >= start_date - 3
AND day <= end_date
)
,
sequenced_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY myuser, user_group_id ORDER BY day) AS sequence_id,
myuser,
day
FROM
grouped_data
)
SELECT
myuser,
day,
CASE WHEN sequence_id % 4 = 0 THEN 1 ELSE 0 END as should_be_a_break_day
FROM
sequenced_data
WHERE
day >= start_date
Sorry I didn't explain the workings, I had to jump into a meeting :)
Example with start_date = '2012-01-14'...
id | myuser | day | ROW_NUMBER() | day - start_date | user_group_id
----+--------+------------+--------------+------------------+---------------
0 | 200 | 2012-01-12 | 1 | -2 | 1 - -2 = 3
1 | 200 | 2012-01-13 | 2 | -1 | 2 - -1 = 3
2 | 200 | 2012-01-14 | 3 | 0 | 3 - 0 = 3
3 | 200 | 2012-01-15 | 4 | 1 | 4 - 1 = 3
4 | 200 | 2012-01-16 | 5 | 2 | 5 - 2 = 3
----+--------+------------+--------------+------------------+---------------
5 | 201 | 2012-01-12 | 1 | -2 | 1 - -2 = 3
6 | 201 | 2012-01-13 | 2 | -1 | 2 - -1 = 3
7 | 201 | 2012-01-14 | 3 | 0 | 3 - -1 = 3
8 | 201 | 2012-01-16 | 4 | 2 | 4 - 2 = 2
Any sequential dates will have the same user_group_id. Each 'gap' in the days makes that user_group_id decrease by 1 (see row 8, if the record was for the 17th, a 2 day gap, the id would have been 1).
Once you have a group_id, row_number() can be easily used to say which day in the sequence it is. A max of 3 day is the same as "Every 4th day should be a gap", and "x % 4 = 0" identifies every 4th day.
Upvotes: 5