Reputation: 1
We have 5 node MariaDB Galera cluster. In our case a MyISAM table was dropped after which the entire cluster crashed.I have read other blogs/questions but need some knowledge about internal working in this case.
**I need to understand - 1. A drop table command was initiated and from logs it is clear that the record which was being deleted could not be found in other nodes (as it was MyISAM) and hence consistency was compromised and so cluster went down. MyISAM replicates DDL statement then why does it checks for internal records of a table across all nodes - shouldn't it directly drop table without comparing records on other nodes as it is drop and not delete statement. (FYI - the table was present on all nodes but different records)
Recommendation to avoid this or any patch/bug fix to apply??
Any other point I am missing **
MariaDB version - 10.1.33
Logs -
2018-08-16 15:55:26 140366938635008 [Note] InnoDB: Online DDL : End of reading clustered index of the table and create temporary files
**2018-08-16 16:24:27 140366938635008 [ERROR] Slave SQL: Error 'Unknown table 'DB_TOTAL4DEV_P.users_last_login_2'' on query. Default database: ''. Query: 'drop table `DB_TOTAL4DEV_P`.`users_last_login_2`', Internal MariaDB error code: 1051**
**2018-08-16 16:24:27 140366938635008 [Warning] WSREP: RBR event 1 Query apply warning: 1, 309050572
2018-08-16 16:24:27 140366938635008 [Warning] WSREP: Ignoring error for TO isolated action: source: 4533077b-9f57-11e8-94aa-6b070f22a4b5 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 492382 trx_id: -1 seqnos (l: 805550, g: 309050572, s: 309050571, d: 309050571, ts: 223731321595519)**
2018-08-16 16:24:28 140366938635008 [ERROR] Slave SQL: Error 'Unknown table 'DB_TOTAL4DEV_P.users_last_login_2'' on query. Default database: ''. Query: 'drop table `DB_TOTAL4DEV_P`.`users_last_login_2`', Internal MariaDB error code: 1051
2018-08-16 16:24:28 140366938635008 [Warning] WSREP: RBR event 1 Query apply warning: 1, 309050575
2018-08-16 16:24:28 140366938635008 [Warning] WSREP: Ignoring error for TO isolated action: source: 56daeaa4-9e8e-11e8-b891-6f203bac329a version: 3 local: 0 state: APPLYING flags: 65 conn_id: 708893 trx_id: -1 seqnos (l: 805553, g: 309050575, s: 309050574, d: 309050574, ts: 310032366214700)
2018-08-16 16:28:45 140366938635008 [ERROR] mysqld: Can't find record in 'users_last_login'
2018-08-16 16:28:45 140366938635008 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table DB_TOTAL4DEV_P.users_last_login; Can't find record in 'users_last_login', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1862, Internal MariaDB error code: 1032
2018-08-16 16:28:45 140366938635008 [Warning] WSREP: RBR event 3 Delete_rows_v1 apply warning: 120, 309051933
2018-08-16 16:28:45 140366938635008 [Warning] WSREP: Failed to apply app buffer: seqno: 309051933, status: 1
2018-08-16 16:28:45 140366938635008 [ERROR] mysqld: Can't find record in 'users_last_login'
2018-08-16 16:28:45 140366938635008 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table DB_TOTAL4DEV_P.users_last_login; Can't find record in 'users_last_login', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1862, Internal MariaDB error code: 1032
2018-08-16 16:28:45 140366938635008 [Warning] WSREP: RBR event 3 Delete_rows_v1 apply warning: 120, 309051933
2018-08-16 16:28:45 140366938635008 [Warning] WSREP: Failed to apply app buffer: seqno: 309051933, status: 1
2018-08-16 16:28:45 140366938635008 [ERROR] mysqld: Can't find record in 'users_last_login'
2018-08-16 16:28:45 140366938635008 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table DB_TOTAL4DEV_P.users_last_login; Can't find record in 'users_last_login', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1862, Internal MariaDB error code: 1032
2018-08-16 16:28:45 140366938635008 [Warning] WSREP: RBR event 3 Delete_rows_v1 apply warning: 120, 309051933
2018-08-16 16:28:45 140366938635008 [Warning] WSREP: Failed to apply app buffer: seqno: 309051933, status: 1
2018-08-16 16:28:45 140366938635008 [ERROR] mysqld: Can't find record in 'users_last_login'
2018-08-16 16:28:45 140366938635008 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table DB_TOTAL4DEV_P.users_last_login; Can't find record in 'users_last_login', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1862, Internal MariaDB error code: 1032
2018-08-16 16:28:45 140366938635008 [Warning] WSREP: RBR event 3 Delete_rows_v1 apply warning: 120, 309051933
2018-08-16 16:28:45 140366938635008 [ERROR] WSREP: Failed to apply trx: source: 808e3422-9dcd-11e8-bbc0-3ab028afd1b7 version: 3 local: 0 state: APPLYING flags: 1 conn_id: 893106 trx_id: 15068161908 seqnos (l: 806931, g: 309051933, s: 309051932, d: 309051868, ts: 393116532516870)
**2018-08-16 16:28:45 140366938635008 [ERROR] WSREP: Failed to apply trx 309051933 4 times
2018-08-16 16:28:45 140366938635008 [ERROR] WSREP: Node consistency compromised, aborting...**
2018-08-16 16:28:45 140366938635008 [Note] WSREP: Closing send monitor...
2018-08-16 16:28:45 140366938635008 [Note] WSREP: Closed send monitor.
2018-08-16 16:28:45 140366938635008 [Note] WSREP: gcomm: terminating thread
2018-08-16 16:28:45 140366938635008 [Note] WSREP: gcomm: joining thread
2018-08-16 16:28:45 140366938635008 [Note] WSREP: gcomm: closing backend
2018-08-16 16:28:45 140366938635008 [Note] WSREP: (590a3f66, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.134.18.25:4567 tcp://10.134.18.5:4567
2018-08-16 16:28:45 140366938635008 [Note] WSREP: declaring 7d266759 at tcp://10.134.19.4:4567 stable
2018-08-16 16:28:45 140366938635008 [Note] WSREP: declaring 808e3422 at tcp://10.134.18.4:4567 stable
2018-08-16 16:28:45 140366938635008 [Note] WSREP: forgetting 4533077b (tcp://10.134.18.5:4567)
2018-08-16 16:28:45 140366938635008 [Note] WSREP: forgetting 56daeaa4 (tcp://10.134.18.25:4567)
2018-08-16 16:28:45 140366938635008 [Note] WSREP: (590a3f66, 'tcp://0.0.0.0:4567') turning message relay requesting off
2018-08-16 16:28:45 140366938635008 [Note] WSREP: Node 590a3f66 state prim
2018-08-16 16:28:45 140366938635008 [Warning] WSREP: user message in state LEAVING
2018-08-16 16:28:45 140366938635008 [Warning] WSREP: 590a3f66 sending install message failed: Transport endpoint is not connected
2018-08-16 16:28:48 140366938635008 [Note] WSREP: (590a3f66, 'tcp://0.0.0.0:4567') connection to peer 808e3422 with addr tcp://10.134.18.4:4567 timed out, no messages seen in PT3S
2018-08-16 16:28:48 140366938635008 [Note] WSREP: (590a3f66, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.134.18.4:4567
2018-08-16 16:28:49 140366938635008 [Note] WSREP: (590a3f66, 'tcp://0.0.0.0:4567') reconnecting to 808e3422 (tcp://10.134.18.4:4567), attempt 0
2018-08-16 16:28:50 140366938635008 [Note] WSREP: cleaning up 56daeaa4 (tcp://10.134.18.25:4567)
2018-08-16 16:28:50 140366938635008 [Note] WSREP: cleaning up 4533077b (tcp://10.134.18.5:4567)
2018-08-16 16:28:50 140366938635008 [Note] WSREP: evs::proto(590a3f66, LEAVING, view_id(REG,590a3f66,613)) suspecting node: 7d266759
2018-08-16 16:28:50 140366938635008 [Note] WSREP: evs::proto(590a3f66, LEAVING, view_id(REG,590a3f66,613)) suspecting node: 808e3422
2018-08-16 16:28:50 140366938635008 [Note] WSREP: evs::proto(590a3f66, LEAVING, view_id(REG,590a3f66,613)) suspected node without join message, declaring inactive
2018-08-16 16:28:50 140366938635008 [Note] WSREP: view(view_id(NON_PRIM,590a3f66,613) memb {
2018-08-16 16:28:50 140366938635008 [Note] WSREP: view((empty))
2018-08-16 16:28:50 140366938635008 [Note] WSREP: gcomm: closed
2018-08-16 16:28:50 140366636377856 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-08-16 16:28:50 140366636377856 [Note] WSREP: Flow-control interval: [16, 16]
2018-08-16 16:28:50 140366636377856 [Note] WSREP: Trying to continue unpaused monitor
2018-08-16 16:28:50 140366636377856 [Note] WSREP: Received NON-PRIMARY.
2018-08-16 16:28:50 140366636377856 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 309051933)
2018-08-16 16:28:50 140366636377856 [Note] WSREP: Received self-leave message.
2018-08-16 16:28:50 140366636377856 [Note] WSREP: Flow-control interval: [0, 0]
2018-08-16 16:28:50 140366636377856 [Note] WSREP: Trying to continue unpaused monitor
2018-08-16 16:28:50 140366636377856 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2018-08-16 16:28:50 140366636377856 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 309051933)
2018-08-16 16:28:50 140366636377856 [Note] WSREP: RECV thread exiting 0: Success
2018-08-16 16:28:50 140366938635008 [Note] WSREP: recv_thread() joined.
2018-08-16 16:28:50 140366938635008 [Note] WSREP: Closing replication queue.
2018-08-16 16:28:50 140366938635008 [Note] WSREP: Closing slave action queue.
2018-08-16 16:28:50 140366938635008 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Thanks for any help. :) Please excuse my basic mariaDB/Galera knowledge.
Upvotes: 0
Views: 773
Reputation: 1
We successfully reproduced the issue on test. Issue was that MyISAM table was changed to INNODB, then table was dropped. As the tables had different data MySQL stopped on all other nodes and Galera crashed.
After changing to INNODB any new data will be replicated but existing data which is different across nodes will also remain as it is in table. Now if we delete the table or the unreplicated record from table, that record is not available on other node and hence cluster crashes. So need to make sure this point before change table storage engine.
Upvotes: 0