Reputation: 497
This is my table tusers
on MySQL 5.5.1 database community version
mysql> SELECT * FROM `tusers`;
+------------+------------+----------+-----+
| tIDUSER | tDate | tHour | tID |
+------------+------------+----------+-----+
| Controneri | 2022-01-06 | 07:54:42 | 1 |
| Controneri | 2022-01-06 | 07:43:38 | 2 |
| Controneri | 2022-01-06 | 07:13:09 | 3 |
| Controneri | 2022-01-06 | 06:31:52 | 4 |
| Controneri | 2022-01-06 | 06:13:12 | 5 |
+------------+------------+----------+-----+
5 rows in set (0.13 sec)
I need select from the table tusers
only these rows
+------------+------------+----------+-----+
| tIDUSER | tDate | tHour | tID |
+------------+------------+----------+-----+
| Controneri | 2022-01-06 | 07:43:38 | 2 |
| Controneri | 2022-01-06 | 06:13:12 | 5 |
+------------+------------+----------+-----+
Because the other rows are repeated for the same user Controneri
within one hour compared to the previous row.
Each user access to the web page is stored on the table tusers for date and time.
But I have to extract only the first access and exclude the repeated accesses in the time span of one hour.
On this example the user Controneri
on January 6 he was logged in 5 times. But the valid accesses are those at 06:13:12
and 07:43:38
, because after the access at 06:13:12 there were other accesses before 07:13:12
, i.e. before the end of the hour compared to the hours 06:13:12
(06:31:52
and 07:13:09
, rows 4 and 3).
I have tried without success.
My table structure and the Select query
below on db-fiddle.com, which offers MySQL 5
Any suggestion?
-- ----------------------------
-- Table structure for tusers
-- ----------------------------
DROP TABLE IF EXISTS `tusers`;
CREATE TABLE `tusers` (
`tIDUSER` varchar(255) NULL DEFAULT NULL,
`tDate` date NULL DEFAULT NULL,
`tHour` time NULL DEFAULT NULL,
`tID` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`tID`) USING BTREE
) ENGINE = InnoDB;
-- ----------------------------
-- Records of tusers
-- ----------------------------
INSERT INTO `tusers` VALUES ('Controneri', '2022-01-06', '07:54:42', 1);
INSERT INTO `tusers` VALUES ('Controneri', '2022-01-06', '07:43:38', 2);
INSERT INTO `tusers` VALUES ('Controneri', '2022-01-06', '07:13:09', 3);
INSERT INTO `tusers` VALUES ('Controneri', '2022-01-06', '06:31:52', 4);
INSERT INTO `tusers` VALUES ('Controneri', '2022-01-06', '06:13:12', 5);
SELECT
a.tID,
a.tDate,
a.tHour,
a.tIDUSER,
TIMEDIFF( a.tHour, b.tHour ) AS tDif
FROM
`tusers` a
JOIN `tusers` b ON
a.tDate = b.tDate
AND a.tIDUSER = b.tIDUSER
AND a.tID > b.tID
WHERE
( TIMEDIFF( a.tHour, b.tHour ) BETWEEN '00:00:00' AND '01:00:00' )
ORDER BY
a.tIDUSER,
a.tDate,
a.tHour ASC;
Upvotes: 0
Views: 80
Reputation: 12998
For MySQL 5.5 you can achieve this by tracking the previous values in user variables -
SELECT tIDUSER, tDate, tHour, tID
FROM (
SELECT
tusers.*,
IF(@prev_date_time IS NULL OR @prev_user <> tIDUSER OR @prev_date_time + INTERVAL 1 HOUR < TIMESTAMP(tDate, tHour), @prev_date_time := TIMESTAMP(tDate, tHour), NULL) AS `show`,
@prev_user := tIDUSER
FROM tusers, (SELECT @prev_date_time := NULL, @prev_user := NULL) n
ORDER BY tIDUSER ASC, tDate ASC, tHour ASC
) t
WHERE `show` IS NOT NULL
ORDER BY tIDUSER ASC, tDate ASC, tHour ASC;
And here's a db<>fiddle. Thanks to sticky bit as I took the liberty of "borrowing" from their db<>fiddle.
The MySQL 5.6 manual states -
However, the order of evaluation for expressions involving user variables is undefined.
And in later versions is extended to say -
The order of evaluation for expressions involving user variables is undefined. For example, there is no guarantee that SELECT @a, @a:=@a+1 evaluates @a first and then performs the assignment.
The MySQL 5.7 manual also states -
It is also possible to assign a value to a user variable in statements other than SET. (This functionality is deprecated in MySQL 8.0 and subject to removal in a subsequent release.) When making an assignment in this way, the assignment operator must be := and not = because the latter is treated as the comparison operator = in statements other than SET:
Despite the above warnings, this method has been widely used for many years. Your mileage may vary.
I suspect this will perform badly with larger result sets but give it a try.
Upvotes: 1
Reputation: 37472
As requested by the OP in the comments, here is a query using recursive CTEs which will be available with MySQL version 8 and higher.
WITH RECURSIVE
cte1
AS
(
SELECT tusers.tiduser,
tusers.tdate,
tusers.thour,
tusers.tid,
addtime(tusers.tdate, tusers.thour) AS sane_timestamp_representation,
row_number() OVER (PARTITION BY tusers.tiduser
ORDER BY addtime(tusers.tdate, tusers.thour) ASC) AS rn
FROM tusers
),
cte2
AS
(
SELECT cte1.tiduser,
cte1.tdate,
cte1.thour,
cte1.tid,
cte1.sane_timestamp_representation,
0 AS n
FROM cte1
UNION ALL
SELECT cte1.tiduser,
cte1.tdate,
cte1.thour,
cte1.tid,
cte1.sane_timestamp_representation,
cte2.n + 1 AS n
FROM cte2
INNER JOIN cte1
ON cte2.tiduser = cte1.tiduser
AND cte1.sane_timestamp_representation > adddate(cte2.sane_timestamp_representation, INTERVAL 1 HOUR)
),
cte3
AS
(
SELECT cte2.tiduser,
cte2.tdate,
cte2.thour,
cte2.tid,
cte2.sane_timestamp_representation,
row_number() OVER (PARTITION BY cte2.tiduser,
cte2.n
ORDER BY cte2.sane_timestamp_representation ASC) rn
FROM cte2
)
SELECT cte3.tiduser,
cte3.tdate,
cte3.thour,
cte3.tid
FROM cte3
WHERE cte3.rn = 1
ORDER BY cte3.tiduser ASC,
cte3.sane_timestamp_representation ASC;
1.
In cte1
we first and foremost unite that day and hour part of the timestamp (not the brightest idea to save them as two different columns; it'll become a mess when day boundaries have to be crossed). We also assign a row_number()
rn
according to the timestamp in ascending order per user. cte1
acts as our "base table" from now on.
2.
Now in cte2
the recursiveness happens. As anchor we take all the rows from cte1
where cte1.rn = 1
. These are the records for a user with the minimum timestamp for that user. We also add some number n
. For those initial anchor rows we set n
to 0
. n
will act as an indicator which rows cannot cover each other. All rows with an n + x
for x > 1
cannot be covered by any row with n
(per user).
In the recursive step we join all records from cte1
past an hour per user. Since these cannot be covered by the records already in the result set (per user), they're past an hour, we assign n + 1
as n
to them.
3.
cte3
adds another row_number()
rn
ordering the records by the timestamp ascending per user
and n
. Those with an rn
of 1
aren't covered themselves by any previous record for the user because all others with equal or greater n
have greater timestamps and those with lesser n
don't cover them as per we constructed n
. So we can select these records from cte3
where rn = 1
and get our final result.
The intermediate result sets will grow rapidly! You can try to select from cte3
without a WHERE
clause and see for yourself. So while this shows it can be done theoretically, it might not be practical, even for medium sets of data. The needed resources can quickly exceed maximums.
(And well, since AFAIK SQL with recursive CTEs is Turing complete and the problem seems well computable, it was clear that it can be done anyway. But it still was interesting to see an example how it can be done, I think.)
Maybe it can be optimized. The key, I believe, is to limit the joined rows in the recursive step. We actually only need to join the oldest record past an hour, that would be the next record of interest. That would also make cte3
and the WHERE
in the final SELECT
unnecessary (unless for projection to get rid of the helper columns). But I didn't find a way to do so. LIMIT
as well as window functions aren't allowed or implemented for recursive CTEs, at least in the recursive step. But if somebody comes up with such an optimization, I'd love to see it!
Oh, and the stupid timestamp representation in two columns, which needs to be put together at first, will also render the use of indexes on the timestamps impossible. So that's another factor limiting performance here.
Upvotes: 1