Reputation: 1
One of our MariaDB/Galera clusters crashed last week. We started a new cluster with the first node, joined the second node, but couldn't join a third node.
We removed all files from data directory and the system started a SST job. But it seems mysql is getting a 'uuid' cache from somewhere and after the transfer it couldn't start and join the cluster. Logs:
2021-07-31 19:01:51 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2021-07-31 19:01:51 0 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 00000000-0000-0000-0000-000000000000:-1
2021-07-31 19:01:52 2 [Note] WSREP: State transfer required:
**Group state: 6148b40a-ef57-11eb-92ab-77aa611985cb:581967649**
Local state: 00000000-0000-0000-0000-000000000000:-
2021-07-31 19:01:52 2 [Note] WSREP: New cluster view: global state: 6148b40a-ef57-11eb-92ab-77aa611985cb:581967649, view# 5: Primary, number of nodes: 3, my index: 2, protocol version 3
2021-07-31 19:01:52 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
2021-07-31 19:01:52 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.73.64.104' --datadir '/media/dados/mysql/' --parent '28752' '' '''
2021-07-31 19:01:52 2 [Note] WSREP: Prepared SST request: rsync|10.73.64.104:4444/rsync_sst
2021-07-31 19:01:52 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-07-31 19:01:52 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
2021-07-31 19:01:52 2 [Note] WSREP: Assign initial position for certification: 581967649, protocol version: 4
2021-07-31 19:01:52 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (6148b40a-ef57-11eb-92ab-77aa611985cb): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2021-07-31 19:01:52 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 581967650)
2021-07-31 19:01:52 2 [Note] WSREP: Requesting state transfer: success, donor: 0
2021-07-31 19:01:52 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 6148b40a-ef57-11eb-92ab-77aa611985cb:581967649
2021-07-31 19:55:01 0 [Note] WSREP: SST complete, seqno: 581967651
2021-07-31 19:55:04 0 [Note] WSREP: SST received: ba9d2e19-a7ed-11e8-ae5d-f7d6266c9160:581967651
2021-07-31 19:55:04 2 [ERROR] WSREP: Application received wrong state:
**Received: ba9d2e19-a7ed-11e8-ae5d-f7d6266c9160**
Required: 6148b40a-ef57-11eb-92ab-77aa611985cb
2021-07-31 19:55:04 2 [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required.
The cluster is running with uuid: 6148b40a-ef57-11eb-92ab-77aa611985cb but after SST this node is 'receiving' uuid ba9d2e19-a7ed-11e8-ae5d-f7d6266c9160
Do you have any idea how to solve this issue ?
Thanks, Fernando
Upvotes: 0
Views: 1701
Reputation: 2612
What is your wsrep_sst_donor
value ?
Have you started with empty datadir, particularly grastate.dat
files ?
Have you tried increasing the systemd timeout of MariaDB process on that node?
sudo tee /etc/systemd/system/mariadb.service.d/timeoutstartsec.conf <<EOF
[Service]
TimeoutStartSec=1200
EOF
sudo systemctl daemon-reload
Upvotes: 0