Sap1234
Sap1234

Reputation: 147

How to classify records based don running difference?

I am trying to figure out how to find beginning of each sequence of records, where sequence stars when the time difference between records that belong to the same category is greater for example 5 minuntes.

For example, I have a table like:

field1 field2 date category
14 abc 2021-05-05 16:11:00 1
13 abc 2021-05-05 16:11:35 1
20 eee 2021-05-05 18:22:22 2
21 aaa 2021-05-05 20:00:00 1
28 www 2021-05-05 21:22:22 1
27 ddd 2021-05-05 21:23:33 1

which is sorted by date in ascending order. I would like to mark beginning of each sequence for category 1, so the expected result would be:

field1 field2 date category result Comment
14 abc 2021-05-05 16:11:00 1 1 Session starts
13 abc 2021-05-05 16:11:35 1 0
20 eee 2021-05-05 18:22:22 2 0
21 aaa 2021-05-05 20:00:00 1 1 Session starts
28 www 2021-05-05 21:22:22 1 1 Session starts
27 ddd 2021-05-05 21:23:33 1 0

The first problem that I am facing with is because there are two categories. So far I found time difference in minutes between consecutive rows:

WITH data as (
SELECT 
    table1.field1, 
    table1.field2, 
    table1.date, 
    table1.category 
FROM table1
)

SELECT *, 
    @diff:= TIMESTAMPDIFF(MINUTE, @prev_date, data.date) AS diff,
    @prev_date:= data.date as dummy
FROM data

so, I get:

field1 field2 date category diff
14 abc 2021-05-05 16:11:00 1 -312
13 abc 2021-05-05 16:11:35 1 0
20 eee 2021-05-05 18:22:22 2 130
21 aaa 2021-05-05 20:00:00 1 97
28 www 2021-05-05 21:22:22 1 82
27 ddd 2021-05-05 21:23:33 1 1

but, I do not know how to create beginning of sequences. Acceptetble solution is also if category 2 'resets' session that belong to category 1. So, after every record that has category 2, a new sequence starts for category 1, regardless of previous timestamp.

Here is the dataset:

CREATE TABLE `table1` (
  `field1` int(11) NOT NULL,
  `field2` varchar(50) NOT NULL,
  `date` datetime DEFAULT NULL,
  `category` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `table1` 
(`field1`, `field2`, `date`, `category`) 
VALUES
(14, 'abc', '2021-05-05 16:11:00', '1'),
(13, 'abc', '2021-05-05 16:11:35', '1'),
(20, 'eee', '2021-05-05 18:22:22', '2'),
(21, 'aaa', '2021-05-05 20:00:00', '1'),
(28, 'www', '2021-05-05 21:22:22', '1'),
(27, 'ddd', '2021-05-05 21:23:33', '1');

and the Fiddle of same.

Upvotes: 0

Views: 20

Answers (2)

Akina
Akina

Reputation: 42728

SELECT *,
       CASE WHEN TIMESTAMPDIFF(second, LAG(`date`) OVER (PARTITION BY category ORDER BY `date`), `date`) < 300
            THEN 0
            ELSE 1
            END result,
       CASE WHEN TIMESTAMPDIFF(second, LAG(`date`) OVER (PARTITION BY category ORDER BY `date`), `date`) < 300
            THEN ''
            ELSE 'Session starts'
            END Comment
FROM table1
ORDER BY `date`

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=b0d75c7a448201580c53605d439c56d3

Upvotes: 1

sticky bit
sticky bit

Reputation: 37482

The lag() window function let you access the "previous" record so that you can check the time difference.

SELECT field1,
       field2,
       date,
       category,
       category = 1
       AND (lag(date) OVER (ORDER BY date) IS NULL
             OR timestampdiff(MINUTE, lag(date) OVER (ORDER BY date), date) > 5) result,
       CASE
         WHEN category = 1
              AND (lag(date) OVER (ORDER BY date) IS NULL
                    OR timestampdiff(MINUTE, lag(date) OVER (ORDER BY date), date) > 5) THEN
           'Session starts'
       END comment
       FROM table1
       ORDER BY date;

db<>fiddle

Upvotes: 1

Related Questions