Reputation: 147
I am trying to figure out how to find beginning of each sequence of records, where sequence stars when the time difference between records that belong to the same category is greater for example 5 minuntes.
For example, I have a table like:
field1 | field2 | date | category |
---|---|---|---|
14 | abc | 2021-05-05 16:11:00 | 1 |
13 | abc | 2021-05-05 16:11:35 | 1 |
20 | eee | 2021-05-05 18:22:22 | 2 |
21 | aaa | 2021-05-05 20:00:00 | 1 |
28 | www | 2021-05-05 21:22:22 | 1 |
27 | ddd | 2021-05-05 21:23:33 | 1 |
which is sorted by date in ascending order. I would like to mark beginning of each sequence for category 1, so the expected result would be:
field1 | field2 | date | category | result | Comment |
---|---|---|---|---|---|
14 | abc | 2021-05-05 16:11:00 | 1 | 1 | Session starts |
13 | abc | 2021-05-05 16:11:35 | 1 | 0 | |
20 | eee | 2021-05-05 18:22:22 | 2 | 0 | |
21 | aaa | 2021-05-05 20:00:00 | 1 | 1 | Session starts |
28 | www | 2021-05-05 21:22:22 | 1 | 1 | Session starts |
27 | ddd | 2021-05-05 21:23:33 | 1 | 0 |
The first problem that I am facing with is because there are two categories. So far I found time difference in minutes between consecutive rows:
WITH data as (
SELECT
table1.field1,
table1.field2,
table1.date,
table1.category
FROM table1
)
SELECT *,
@diff:= TIMESTAMPDIFF(MINUTE, @prev_date, data.date) AS diff,
@prev_date:= data.date as dummy
FROM data
so, I get:
field1 | field2 | date | category | diff |
---|---|---|---|---|
14 | abc | 2021-05-05 16:11:00 | 1 | -312 |
13 | abc | 2021-05-05 16:11:35 | 1 | 0 |
20 | eee | 2021-05-05 18:22:22 | 2 | 130 |
21 | aaa | 2021-05-05 20:00:00 | 1 | 97 |
28 | www | 2021-05-05 21:22:22 | 1 | 82 |
27 | ddd | 2021-05-05 21:23:33 | 1 | 1 |
but, I do not know how to create beginning of sequences. Acceptetble solution is also if category 2 'resets' session that belong to category 1. So, after every record that has category 2, a new sequence starts for category 1, regardless of previous timestamp.
Here is the dataset:
CREATE TABLE `table1` (
`field1` int(11) NOT NULL,
`field2` varchar(50) NOT NULL,
`date` datetime DEFAULT NULL,
`category` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO `table1`
(`field1`, `field2`, `date`, `category`)
VALUES
(14, 'abc', '2021-05-05 16:11:00', '1'),
(13, 'abc', '2021-05-05 16:11:35', '1'),
(20, 'eee', '2021-05-05 18:22:22', '2'),
(21, 'aaa', '2021-05-05 20:00:00', '1'),
(28, 'www', '2021-05-05 21:22:22', '1'),
(27, 'ddd', '2021-05-05 21:23:33', '1');
and the Fiddle of same.
Upvotes: 0
Views: 20
Reputation: 42728
SELECT *,
CASE WHEN TIMESTAMPDIFF(second, LAG(`date`) OVER (PARTITION BY category ORDER BY `date`), `date`) < 300
THEN 0
ELSE 1
END result,
CASE WHEN TIMESTAMPDIFF(second, LAG(`date`) OVER (PARTITION BY category ORDER BY `date`), `date`) < 300
THEN ''
ELSE 'Session starts'
END Comment
FROM table1
ORDER BY `date`
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=b0d75c7a448201580c53605d439c56d3
Upvotes: 1
Reputation: 37482
The lag()
window function let you access the "previous" record so that you can check the time difference.
SELECT field1,
field2,
date,
category,
category = 1
AND (lag(date) OVER (ORDER BY date) IS NULL
OR timestampdiff(MINUTE, lag(date) OVER (ORDER BY date), date) > 5) result,
CASE
WHEN category = 1
AND (lag(date) OVER (ORDER BY date) IS NULL
OR timestampdiff(MINUTE, lag(date) OVER (ORDER BY date), date) > 5) THEN
'Session starts'
END comment
FROM table1
ORDER BY date;
Upvotes: 1