Reputation: 17
I met a disability while using galeracluser and I have a question.
I'm using MariaDB 10.5.13 version with Galera Cluster. The issue I'm facing is that all nodes are slowing down due to a query, resulting in connection timeouts.
Here's the scenario:
CREATE TABLE item_temp AS SELECT COUNT(*) FROM item WHERE item_status = 1;
UPDATE item SET item_status = 3 where id = (select id from item_temp);
4-1. killed maria db of 'A' node. and I tried restarting 'A' maria db but failed
My suspicion is that the update query caused the issue, but I'm unsure why all nodes slowed down. Here are my hypotheses:
The create query might still be in the process list, causing all write transactions to be blocked, which could have had an impact. Alternatively, the update query might have been applied immediately and replicated to other nodes simultaneously, resulting in many node updates and impacting the server. Any insights into why all nodes slowed down would be greatly appreciated.
I want to know why not only the node where the update query occurred slowed down, but also to other nodes
+++
Instance RAM usage is as follows.
The innodb buffer is.
The row size of the item table used in the ddl statement is as follows.
0.18 KB
I was analyzing node c and found the following log, after the create statement occurred.
2024-03-15 13:22:34 8 [ERROR] mysqld: Error writing file 'mariadb-bin' (errno: 1950 "Unknown error 1950") 2024-03-15 13:22:34 8 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.item_temp; An attempt was made to binlog GTID 0-1-117195197 which would create an out-of-order sequence number with existing GTID 0-1-117195197, and gtid strict mode is enabled, Error_code: 1950; Error writing file 'mariadb-bin' (errno: 1950 "Unknown error 1950"), Error_code: 1026; the event's master log FIRST, end_log_pos 319, Internal MariaDB error code: 1950 2024-03-15 13:22:34 8 [Warning] WSREP: Event 4 Write_rows_v1 apply failed: 1, seqno 117208461
Upvotes: 0
Views: 390
Reputation: 14646
Back when Galera was fixed to make CREATE TABLE .. SELECT
work (MDEV-7673) this placed the cluster into TOI mode for this statement to replicate. This implementation is still there.
So recommendation 1, use CREATE TABLE
and then INSERT INTO .. SELECT
, especially if you have a large query that ultimately aggregates to a small result set. This small result set will replicate quickly in a non-TOI mode.
3./4. where as a result of the TOI mode where, as your processlist in (5. image shows), was showing, the CREATE TABLE .. SELECT
was being executed. The lack of database connections is the existing UPDATE/INSERT
queries waiting until TOI is over are consuming all the connections.
5.
The TOI mode CREATE TABLE ... SELECT
running over an hour is odd. Even 2M rows I'd expect quicker. Did you configure a larger innodb buffer pool size to match your instance size? Is this a table with a particularly large row size?
6.
A restart unless something is genuinely stalled is usually a mistake. The rollback can be long.
The non-appearance of UPDATE item SET item_status = 3;
is expected as without a certification result from the cluster this transaction was never committed. As the server was shutdown, this is considered an aborted transaction as as such was rolled back from the undo log on restart.
Upvotes: 1