Reputation: 172
I've been battling with a MySQL deadlock problem for some time. We have a lot of tables logging data, that then have after-insert triggers that extract per minute statistics/summary data saved to another summary table. Obviously this would result in multiple of these Inserts affecting the same row's in the summary table. But since nothing is waiting on the results of the insert to continue, this should not cause a deadlock. The Inserts are done in batches - using batched inserts every few milliseconds. And they can be done from different applications at the same time. Since these batch insert statements are never part of a larger transaction, I don't quite understand why it would result in deadlocks. If somebody can explain why exactly it happens it would be much appreciated! From the error log, I just see multiple lines with:
RECORD LOCKS space id 118597 page no 67 n bits 80 index PRIMARY of table `logschema`.`table_summary_stats` /* Partition `p_2020_11_02` */ trx id 7600352476 lock_mode X locks rec but not gap
Record lock, heap no 11 PHYSICAL RECORD: n_fields 13; compact format; info bits 0
Now, It seems like I finally managed to get rid of the deadlock by manually doing a mysql table lock with a "lock tables" statement before doing the batch insert. I know it is very much frowned upon to do table-level locks on an innodb table, but since I've added this table lock I haven't seen the deadlock happen.
Does it make sense that a Table-level lock will resolve such a deadlock issue? And is it an acceptable way to solve this type of problem or should table locks be avoided at all cost when using an innodb table?
EDIT: The summary table looks like this:
CREATE TABLE `table_summary_stats` (
`id` bigint DEFAULT NULL,
`DateAndTime` datetime NOT NULL,
`address` varchar(45) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`group` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`result` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`count` int DEFAULT NULL,
PRIMARY KEY (`DateAndTime`,`group`,`result`,`address`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY RANGE (to_days(`DateAndTime`))
(PARTITION p_2020_10_26 VALUES LESS THAN (738090) ENGINE = InnoDB,
PARTITION p_2020_11_10 VALUES LESS THAN (738105) ENGINE = InnoDB,
PARTITION overflow VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
And the trigger does this:
INSERT INTO table_summary_stats
SET
DateAndTime = date_format(from_unixtime(NEW.appEpochMilli/1000), '%Y-%m-%d %H:%i:00'),
address = NEW.address,
group = NEW.group,
result = NEW.result,
count = 1
on duplicate key
update
count = count + 1
And the following is the relevant deadlock information:
------------------------
LATEST DETECTED DEADLOCK
------------------------
2020-11-02 20:00:53 0x7f0cc032a700
*** (1) TRANSACTION:
TRANSACTION 7600352761, ACTIVE 0 sec inserting
mysql tables in use 2, locked 2
LOCK WAIT 4 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 3
MySQL thread id 874850, OS thread handle 139654885635840, query id 3299800570 10.15.0.91 cdrwriter update
INSERT INTO table_summary_stats
SET
DateAndTime = date_format(from_unixtime(NEW.appEpochMilli/1000), '%Y-%m-%d %H:%i:00'),
address = NEW.address,
group = NEW.group,
result = NEW.result,
count = 1
on duplicate key
update
count = count + 1
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 118597 page no 67 n bits 80 index PRIMARY of table `sms_cdr`.`table_summary_stats` /* Partition `p_2020_11_02` */ trx id 7600352761 lock_mode X locks rec but not gap
Record lock, heap no 10 PHYSICAL RECORD: n_fields 13; compact format; info bits 0
0: len 5; hex 99a7c53ec0; asc > ;;
1: len 4; hex 74657374; asc test;;
2: len 30; hex 7b0a202022737461747573223a20226572726f72222c0a202022636f6465; asc { "status": "error", "code; (total 76 bytes);
3: len 11; hex 3933373931303130353131; asc 93791010511;;
4: len 6; hex 0001c5042df9; asc - ;;
5: len 7; hex 01000053520238; asc SR 8;;
6: SQL NULL;
7: len 4; hex 80057c22; asc |";;
8: len 8; hex 80000000642f4d05; asc d/M ;;
9: len 8; hex 8000000000c03473; asc 4s;;
10: len 8; hex 800000001a7e7aee; asc ~z ;;
11: len 8; hex 8000000000f2b5b1; asc ;;
12: len 8; hex 800000008060b217; asc ` ;;
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 118597 page no 67 n bits 80 index PRIMARY of table `sms_cdr`.`table_summary_stats` /* Partition `p_2020_11_02` */ trx id 7600352761 lock_mode X locks rec but not gap waiting
Record lock, heap no 11 PHYSICAL RECORD: n_fields 13; compact format; info bits 0
0: len 5; hex 99a7c54000; asc @ ;;
1: len 4; hex 74657374; asc test;;
2: len 30; hex 7b0a202022737461747573223a20226572726f72222c0a202022636f6465; asc { "status": "error", "code; (total 76 bytes);
3: len 11; hex 3933373931303130353131; asc 93791010511;;
4: len 6; hex 0001c5042cdc; asc , ;;
5: len 7; hex 02000004ea07ff; asc ;;
6: SQL NULL;
7: len 4; hex 8003095b; asc [;;
8: len 8; hex 8000000036a3a0bb; asc 6 ;;
9: len 8; hex 8000000000785507; asc xU ;;
10: len 8; hex 800000000e23089a; asc # ;;
11: len 8; hex 80000000008c8e08; asc ;;
12: len 8; hex 8000000045cb8c64; asc E d;;
*** (2) TRANSACTION:
TRANSACTION 7600352476, ACTIVE 0 sec inserting
mysql tables in use 2, locked 2
LOCK WAIT 4 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 75
MySQL thread id 874775, OS thread handle 139672774735616, query id 3299800787 10.15.0.90 cdrwriter update
INSERT INTO table_summary_stats
SET
DateAndTime = date_format(from_unixtime(NEW.appEpochMilli/1000), '%Y-%m-%d %H:%i:00'),
address = NEW.address,
group = NEW.group,
result = NEW.result,
count = 1
on duplicate key
update
count = count + 1
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 118597 page no 67 n bits 80 index PRIMARY of table `sms_cdr`.`table_summary_stats` /* Partition `p_2020_11_02` */ trx id 7600352476 lock_mode X locks rec but not gap
Record lock, heap no 11 PHYSICAL RECORD: n_fields 13; compact format; info bits 0
0: len 5; hex 99a7c54000; asc @ ;;
1: len 4; hex 74657374; asc test;;
2: len 30; hex 7b0a202022737461747573223a20226572726f72222c0a202022636f6465; asc { "status": "error", "code; (total 76 bytes);
3: len 11; hex 3933373931303130353131; asc 93791010511;;
4: len 6; hex 0001c5042cdc; asc , ;;
5: len 7; hex 02000004ea07ff; asc ;;
6: SQL NULL;
7: len 4; hex 8003095b; asc [;;
8: len 8; hex 8000000036a3a0bb; asc 6 ;;
9: len 8; hex 8000000000785507; asc xU ;;
10: len 8; hex 800000000e23089a; asc # ;;
11: len 8; hex 80000000008c8e08; asc ;;
12: len 8; hex 8000000045cb8c64; asc E d;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 118597 page no 67 n bits 80 index PRIMARY of table `sms_cdr`.`table_summary_stats` /* Partition `p_2020_11_02` */ trx id 7600352476 lock_mode X locks rec but not gap waiting
Record lock, heap no 10 PHYSICAL RECORD: n_fields 13; compact format; info bits 0
0: len 5; hex 99a7c53ec0; asc > ;;
1: len 4; hex 74657374; asc test;;
2: len 30; hex 7b0a202022737461747573223a20226572726f72222c0a202022636f6465; asc { "status": "error", "code; (total 76 bytes);
3: len 11; hex 3933373931303130353131; asc 93791010511;;
4: len 6; hex 0001c5042df9; asc - ;;
5: len 7; hex 01000053520238; asc SR 8;;
6: SQL NULL;
7: len 4; hex 80057c22; asc |";;
8: len 8; hex 80000000642f4d05; asc d/M ;;
9: len 8; hex 8000000000c03473; asc 4s;;
10: len 8; hex 800000001a7e7aee; asc ~z ;;
11: len 8; hex 8000000000f2b5b1; asc ;;
12: len 8; hex 800000008060b217; asc ` ;;
*** WE ROLL BACK TRANSACTION (1)
Upvotes: 2
Views: 2642
Reputation: 142298
"The Inserts are done in batches" -- Sort each batch by the 4-column PK. This should eliminate many deadlocks and turn the rest into "lock waits". (That is, when there would have been a deadlock, it can simply wait for the other connection to finish.)
Also, if practical, limit batches to 100 rows.
It is almost always useless to have the PRIMARY KEY
start with the Partition key.
(I agree that you should try to avoid LOCK TABLES
.)
Explanation
The classic deadlock is:
I grab row number 1, you grab row 2, then I reach for row 2 (but can't get it) and you reach for row 1 (and can't get it). Neither of us is willing to let go of what we have.
So a referee steps in and forces one of us to give back when he has, letting the other proceed to completion.
It is impossible (or impractical) for me (or you) to grab all the rows needed; so the rows are actually grabbed one at a time. Think of a giant UPDATE
that is changing millions of rows. It is not wise to stop everything while I grab all those rows.
This is called "optimistic" -- The processing assumes it will succeed and plows ahead. And 99.999...% of the time a typical transaction will finish before any other connection conflicts with it.
If we grab the rows in the same "order" (such as PRIMARY KEY
order), one of us can finish; the other can simply wait. If the wait is only milliseconds, then the delay is imperceptible. (Limiting the batch size helps here.)
Better?
It may be better (that is, faster and less likely to deadlock) to get rid of the trigger and simply do two batch statements -- one to the original batch INSERT
, the other to batch Upsert (aka IODKU) the summary table.
In any case, catch errors in transactions and replay the entire transaction.
More discussion of high-speed inserting: http://mysql.rjweb.org/doc.php/staging_table (Though not directly applicable, you may find some related tips.)
Upvotes: 3