Reputation: 989
I've been trying to implement the solution here with the added flavour of updating existing records. As an MRE I'm looking to populate the sum_date_diff
column in a table with the sum of all the differences between the current row date and the date of every previous row where the current row p1_id
matches the previous row p1_id
or p2_id
. I have already filled out the expected result below:
+-----+------------+-------+-------+---------------+
| id_ | date_time | p1_id | p2_id | sum_date_diff |
+-----+------------+-------+-------+---------------+
| 1 | 2000-01-01 | 1 | 2 | Null |
| 2 | 2000-01-02 | 2 | 4 | 1 |
| 3 | 2000-01-04 | 1 | 3 | 3 |
| 4 | 2000-01-07 | 2 | 5 | 11 |
| 5 | 2000-01-15 | 2 | 3 | 35 |
| 6 | 2000-01-20 | 1 | 3 | 35 |
| 7 | 2000-01-31 | 1 | 3 | 68 |
+-----+------------+-------+-------+---------------+
My query so far looks like:
UPDATE test.sum_date_diff AS sdd0
JOIN
(SELECT
id_,
SUM(DATEDIFF(sdd1.date_time, sq.date_time)) AS sum_date_diff
FROM
test.sum_date_diff AS sdd1
LEFT OUTER JOIN (SELECT
sdd2.date_time AS date_time, sdd2.p1_id AS player_id
FROM
test.sum_date_diff AS sdd2 UNION ALL SELECT
sdd3.date_time AS date_time, sdd3.p2_id AS player_id
FROM
test.sum_date_diff AS sdd3) AS sq ON sq.date_time < sdd1.date_time
AND sq.player_id = sdd1.p1_id
GROUP BY sdd1.id_) AS master_sq ON master_sq.id_ = sdd0.id_
SET
sdd0.sum_date_diff = master_sq.sum_date_diff
This works as shown here.
However, on a table of 1.5m records the query has been hanging for the last hour. Even when I add a WHERE
clause onto the bottom to restrict the update to a single record then it hangs for 5 mins+.
Here is the EXPLAIN
statement for the query on the full table:
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
| 1 | UPDATE | sum_date_diff | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100 | NULL |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 4 | const | 10 | 100 | NULL |
| 2 | DERIVED | sum_date_diff | NULL | index | PRIMARY,ix__match_oc_history__date_time,ix__match_oc_history__p1_id,ix__match_oc_history__p2_id,ix__match_oc_history__date_time_players | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index; Using temporary |
| 2 | DERIVED | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 2968576 | 100 | Using where; Using join buffer (hash join) |
| 3 | DERIVED | sum_date_diff | NULL | index | NULL | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index |
| 4 | UNION | sum_date_diff | NULL | index | NULL | ix__match_oc_history__date_time_players | 14 | NULL | 1484288 | 100 | Using index |
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
Here is the CREATE TABLE
statement:
CREATE TABLE `sum_date_diff` (
`id_` int NOT NULL AUTO_INCREMENT,
`date_time` datetime DEFAULT NULL,
`p1_id` int NOT NULL,
`p2_id` int NOT NULL,
`sum_date_diff` int DEFAULT NULL,
PRIMARY KEY (`id_`),
KEY `ix__sum_date_diff__date_time` (`date_time`),
KEY `ix__sum_date_diff__p1_id` (`p1_id`),
KEY `ix__sum_date_diff__p2_id` (`p2_id`),
KEY `ix__sum_date_diff__date_time_players` (`date_time`,`p1_id`,`p2_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1822120 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
MySQL version is 8.0.26 running on a 2016 MacBook Pro with Monterey with 16Gb RAM.
After reading around about boosting the RAM available to MySQL I've added the following to the standard my.cnf
file:
innodb_buffer_pool_size = 8G
tmp_table_size=2G
max_heap_table_size=2G
I'm wondering if:
I'm hoping someone could enlighten me!
Upvotes: 0
Views: 127
Reputation: 142228
Whereas it is possible to do calculations like this in SQL, it is messy. If the number of rows is not in the millions, I would fetch the necessary columns into my application and do the arithmetic there. (Loops are easier and faster in PHP/Java/etc than in SQL.)
LEAD()
and LAG()
are possible, but they are not optimized well (or so is my experience). In an APP language, it is easy and efficient to look up things in arrays.
The SELECT
can (easily and efficiently) do any filtering and sorting so that the app only receives the necessary data.
Upvotes: 1