Jossy
Jossy

Reputation: 989

How to perform a sum for all previous records

I've been trying to implement the solution here with the added flavour of updating existing records. As an MRE I'm looking to populate the sum_date_diff column in a table with the sum of all the differences between the current row date and the date of every previous row where the current row p1_id matches the previous row p1_id or p2_id. I have already filled out the expected result below:

+-----+------------+-------+-------+---------------+
| id_ | date_time  | p1_id | p2_id | sum_date_diff |
+-----+------------+-------+-------+---------------+
|   1 | 2000-01-01 |     1 |     2 | Null          |
|   2 | 2000-01-02 |     2 |     4 | 1             |
|   3 | 2000-01-04 |     1 |     3 | 3             |
|   4 | 2000-01-07 |     2 |     5 | 11            |
|   5 | 2000-01-15 |     2 |     3 | 35            |
|   6 | 2000-01-20 |     1 |     3 | 35            |
|   7 | 2000-01-31 |     1 |     3 | 68            |
+-----+------------+-------+-------+---------------+

My query so far looks like:

UPDATE test.sum_date_diff AS sdd0
        JOIN
    (SELECT 
        id_,
            SUM(DATEDIFF(sdd1.date_time, sq.date_time)) AS sum_date_diff
    FROM
        test.sum_date_diff AS sdd1
    LEFT OUTER JOIN (SELECT 
        sdd2.date_time AS date_time, sdd2.p1_id AS player_id
    FROM
        test.sum_date_diff AS sdd2 UNION ALL SELECT 
        sdd3.date_time AS date_time, sdd3.p2_id AS player_id
    FROM
        test.sum_date_diff AS sdd3) AS sq ON sq.date_time < sdd1.date_time
        AND sq.player_id = sdd1.p1_id
    GROUP BY sdd1.id_) AS master_sq ON master_sq.id_ = sdd0.id_ 
SET 
    sdd0.sum_date_diff = master_sq.sum_date_diff

This works as shown here.

However, on a table of 1.5m records the query has been hanging for the last hour. Even when I add a WHERE clause onto the bottom to restrict the update to a single record then it hangs for 5 mins+.

Here is the EXPLAIN statement for the query on the full table:

+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
| id | select_type |     table     | partitions | type  |                                                              possible_keys                                                              |                   key                   | key_len |  ref  |  rows   | filtered |                   Extra                    |
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+
|  1 | UPDATE      | sum_date_diff | NULL       | const | PRIMARY                                                                                                                                 | PRIMARY                                 | 4       | const |       1 |      100 | NULL                                       |
|  1 | PRIMARY     | <derived2>    | NULL       | ref   | <auto_key0>                                                                                                                             | <auto_key0>                             | 4       | const |      10 |      100 | NULL                                       |
|  2 | DERIVED     | sum_date_diff | NULL       | index | PRIMARY,ix__match_oc_history__date_time,ix__match_oc_history__p1_id,ix__match_oc_history__p2_id,ix__match_oc_history__date_time_players | ix__match_oc_history__date_time_players | 14      | NULL  | 1484288 |      100 | Using index; Using temporary               |
|  2 | DERIVED     | <derived3>    | NULL       | ALL   | NULL                                                                                                                                    | NULL                                    | NULL    | NULL  | 2968576 |      100 | Using where; Using join buffer (hash join) |
|  3 | DERIVED     | sum_date_diff | NULL       | index | NULL                                                                                                                                    | ix__match_oc_history__date_time_players | 14      | NULL  | 1484288 |      100 | Using index                                |
|  4 | UNION       | sum_date_diff | NULL       | index | NULL                                                                                                                                    | ix__match_oc_history__date_time_players | 14      | NULL  | 1484288 |      100 | Using index                                |
+----+-------------+---------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+-------+---------+----------+--------------------------------------------+

Here is the CREATE TABLE statement:

CREATE TABLE `sum_date_diff` (
  `id_` int NOT NULL AUTO_INCREMENT,
  `date_time` datetime DEFAULT NULL,
  `p1_id` int NOT NULL,
  `p2_id` int NOT NULL,
  `sum_date_diff` int DEFAULT NULL,
  PRIMARY KEY (`id_`),
  KEY `ix__sum_date_diff__date_time` (`date_time`),
  KEY `ix__sum_date_diff__p1_id` (`p1_id`),
  KEY `ix__sum_date_diff__p2_id` (`p2_id`),
  KEY `ix__sum_date_diff__date_time_players` (`date_time`,`p1_id`,`p2_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1822120 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

MySQL version is 8.0.26 running on a 2016 MacBook Pro with Monterey with 16Gb RAM.

After reading around about boosting the RAM available to MySQL I've added the following to the standard my.cnf file:

innodb_buffer_pool_size = 8G
tmp_table_size=2G
max_heap_table_size=2G

I'm wondering if:

  1. I've done something wrong
  2. This is just a very slow task no matter what I do
  3. There is a faster method

I'm hoping someone could enlighten me!

Upvotes: 0

Views: 127

Answers (1)

Rick James
Rick James

Reputation: 142228

Whereas it is possible to do calculations like this in SQL, it is messy. If the number of rows is not in the millions, I would fetch the necessary columns into my application and do the arithmetic there. (Loops are easier and faster in PHP/Java/etc than in SQL.)

LEAD() and LAG() are possible, but they are not optimized well (or so is my experience). In an APP language, it is easy and efficient to look up things in arrays.

The SELECT can (easily and efficiently) do any filtering and sorting so that the app only receives the necessary data.

Upvotes: 1

Related Questions