Edward Sheriff Curtis
Edward Sheriff Curtis

Reputation: 497

Compare time between consequent rows using MySQL 5.5

This is my table tusers on MySQL 5.5.1 database community version

mysql> SELECT * FROM `tusers`;
+------------+------------+----------+-----+
| tIDUSER    | tDate      | tHour    | tID |
+------------+------------+----------+-----+
| Controneri | 2022-01-06 | 07:54:42 |   1 |
| Controneri | 2022-01-06 | 07:43:38 |   2 |
| Controneri | 2022-01-06 | 07:13:09 |   3 |
| Controneri | 2022-01-06 | 06:31:52 |   4 |
| Controneri | 2022-01-06 | 06:13:12 |   5 |
+------------+------------+----------+-----+
5 rows in set (0.13 sec)

I need select from the table tusers only these rows

+------------+------------+----------+-----+
| tIDUSER    | tDate      | tHour    | tID |
+------------+------------+----------+-----+
| Controneri | 2022-01-06 | 07:43:38 |   2 |
| Controneri | 2022-01-06 | 06:13:12 |   5 |
+------------+------------+----------+-----+

Because the other rows are repeated for the same user Controneri within one hour compared to the previous row.

Each user access to the web page is stored on the table tusers for date and time. But I have to extract only the first access and exclude the repeated accesses in the time span of one hour. On this example the user Controneri on January 6 he was logged in 5 times. But the valid accesses are those at 06:13:12 and 07:43:38, because after the access at 06:13:12 there were other accesses before 07:13:12, i.e. before the end of the hour compared to the hours 06:13:12 (06:31:52 and 07:13:09 , rows 4 and 3).

I have tried without success.

My table structure and the Select query below on db-fiddle.com, which offers MySQL 5

Any suggestion?

-- ----------------------------
-- Table structure for tusers
-- ----------------------------
DROP TABLE IF EXISTS `tusers`;
CREATE TABLE `tusers`  (
  `tIDUSER` varchar(255) NULL DEFAULT NULL,
  `tDate` date NULL DEFAULT NULL,
  `tHour` time NULL DEFAULT NULL,
  `tID` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`tID`) USING BTREE
) ENGINE = InnoDB;

-- ----------------------------
-- Records of tusers
-- ----------------------------
INSERT INTO `tusers` VALUES ('Controneri', '2022-01-06', '07:54:42', 1);
INSERT INTO `tusers` VALUES ('Controneri', '2022-01-06', '07:43:38', 2);
INSERT INTO `tusers` VALUES ('Controneri', '2022-01-06', '07:13:09', 3);
INSERT INTO `tusers` VALUES ('Controneri', '2022-01-06', '06:31:52', 4);
INSERT INTO `tusers` VALUES ('Controneri', '2022-01-06', '06:13:12', 5);

SELECT
        a.tID,
        a.tDate,
        a.tHour,
        a.tIDUSER,
        TIMEDIFF( a.tHour, b.tHour ) AS tDif 
    FROM
        `tusers` a
        JOIN `tusers` b ON 
        a.tDate = b.tDate 
        AND a.tIDUSER = b.tIDUSER 
        AND a.tID > b.tID 
    WHERE
        ( TIMEDIFF( a.tHour, b.tHour ) BETWEEN '00:00:00' AND '01:00:00' ) 
    ORDER BY
        a.tIDUSER,
        a.tDate,
        a.tHour ASC;

Upvotes: 0

Views: 80

Answers (2)

user1191247
user1191247

Reputation: 12998

For MySQL 5.5 you can achieve this by tracking the previous values in user variables -

SELECT tIDUSER, tDate, tHour, tID
FROM (
    SELECT
        tusers.*,
        IF(@prev_date_time IS NULL OR @prev_user <> tIDUSER OR @prev_date_time + INTERVAL 1 HOUR < TIMESTAMP(tDate, tHour), @prev_date_time := TIMESTAMP(tDate, tHour), NULL) AS `show`,
        @prev_user := tIDUSER
    FROM tusers, (SELECT @prev_date_time := NULL, @prev_user := NULL) n
    ORDER BY tIDUSER ASC, tDate ASC, tHour ASC
) t
WHERE `show` IS NOT NULL
ORDER BY tIDUSER ASC, tDate ASC, tHour ASC;

And here's a db<>fiddle. Thanks to sticky bit as I took the liberty of "borrowing" from their db<>fiddle.

The MySQL 5.6 manual states -

However, the order of evaluation for expressions involving user variables is undefined.

And in later versions is extended to say -

The order of evaluation for expressions involving user variables is undefined. For example, there is no guarantee that SELECT @a, @a:=@a+1 evaluates @a first and then performs the assignment.

The MySQL 5.7 manual also states -

It is also possible to assign a value to a user variable in statements other than SET. (This functionality is deprecated in MySQL 8.0 and subject to removal in a subsequent release.) When making an assignment in this way, the assignment operator must be := and not = because the latter is treated as the comparison operator = in statements other than SET:

Despite the above warnings, this method has been widely used for many years. Your mileage may vary.

I suspect this will perform badly with larger result sets but give it a try.

Upvotes: 1

sticky bit
sticky bit

Reputation: 37472

As requested by the OP in the comments, here is a query using recursive CTEs which will be available with MySQL version 8 and higher.

WITH RECURSIVE
cte1
AS
(
SELECT tusers.tiduser,
       tusers.tdate,
       tusers.thour,
       tusers.tid,
       addtime(tusers.tdate, tusers.thour) AS sane_timestamp_representation,
       row_number() OVER (PARTITION BY tusers.tiduser
                          ORDER BY addtime(tusers.tdate, tusers.thour) ASC) AS rn
       FROM tusers
),
cte2
AS
(
SELECT cte1.tiduser,
       cte1.tdate,
       cte1.thour,
       cte1.tid,
       cte1.sane_timestamp_representation,
       0 AS n
       FROM cte1
UNION ALL
SELECT cte1.tiduser,
       cte1.tdate,
       cte1.thour,
       cte1.tid,
       cte1.sane_timestamp_representation,
       cte2.n + 1 AS n
       FROM cte2
            INNER JOIN cte1
                       ON cte2.tiduser = cte1.tiduser
                          AND cte1.sane_timestamp_representation > adddate(cte2.sane_timestamp_representation, INTERVAL 1 HOUR)
),
cte3
AS
(
SELECT cte2.tiduser,
       cte2.tdate,
       cte2.thour,
       cte2.tid,
       cte2.sane_timestamp_representation,
       row_number() OVER (PARTITION BY cte2.tiduser,
                                       cte2.n
                          ORDER BY cte2.sane_timestamp_representation ASC) rn
       FROM cte2
)
SELECT cte3.tiduser,
       cte3.tdate,
       cte3.thour,
       cte3.tid
       FROM cte3
       WHERE cte3.rn = 1
       ORDER BY cte3.tiduser ASC,
                cte3.sane_timestamp_representation ASC;

db<>fiddle

1. In cte1 we first and foremost unite that day and hour part of the timestamp (not the brightest idea to save them as two different columns; it'll become a mess when day boundaries have to be crossed). We also assign a row_number() rn according to the timestamp in ascending order per user. cte1 acts as our "base table" from now on.

2. Now in cte2 the recursiveness happens. As anchor we take all the rows from cte1 where cte1.rn = 1. These are the records for a user with the minimum timestamp for that user. We also add some number n. For those initial anchor rows we set n to 0. n will act as an indicator which rows cannot cover each other. All rows with an n + x for x > 1 cannot be covered by any row with n (per user).

In the recursive step we join all records from cte1 past an hour per user. Since these cannot be covered by the records already in the result set (per user), they're past an hour, we assign n + 1 as n to them.

3. cte3 adds another row_number() rn ordering the records by the timestamp ascending per user and n. Those with an rn of 1 aren't covered themselves by any previous record for the user because all others with equal or greater n have greater timestamps and those with lesser n don't cover them as per we constructed n. So we can select these records from cte3 where rn = 1 and get our final result.

One big fat warning though:

The intermediate result sets will grow rapidly! You can try to select from cte3 without a WHERE clause and see for yourself. So while this shows it can be done theoretically, it might not be practical, even for medium sets of data. The needed resources can quickly exceed maximums.
(And well, since AFAIK SQL with recursive CTEs is Turing complete and the problem seems well computable, it was clear that it can be done anyway. But it still was interesting to see an example how it can be done, I think.)

Maybe it can be optimized. The key, I believe, is to limit the joined rows in the recursive step. We actually only need to join the oldest record past an hour, that would be the next record of interest. That would also make cte3 and the WHERE in the final SELECT unnecessary (unless for projection to get rid of the helper columns). But I didn't find a way to do so. LIMIT as well as window functions aren't allowed or implemented for recursive CTEs, at least in the recursive step. But if somebody comes up with such an optimization, I'd love to see it!

Oh, and the stupid timestamp representation in two columns, which needs to be put together at first, will also render the use of indexes on the timestamps impossible. So that's another factor limiting performance here.

Upvotes: 1

Related Questions