Hedin
Hedin

Reputation: 45

Aerospike migration aborted

We tried to update aerospike version and have a strange problem. We had 3 – node cluster version 3.5.4 and replication factor 2.

And we decide to update to 3.8.2.3, so we installed new version on new server and added new node on cluster, after migration we removed old node. All was perfect.

We decided to repeat our algorithm. We added one more new node to cluster and saw that migration failed. We caught a lot of errors in the logs like below.

Jun 06 2016 22:43:26 GMT: WARNING (partition): (partition.c::2221) {namespace:3368} migrate rx aborted. During migrate receive start, duplicate partition contains primary version

In addition, we saw that count of replica objects less than origin objects, for example:

Our Migration config

So, how we can fix situation?

Upvotes: 2

Views: 142

Answers (1)

kporter
kporter

Reputation: 2768

I see from your output that there aren't any migrations in progress. And the replica counts do not match primary counts.

Prior to 3.7.0.1 prior round migrations could interfere with subsequent rounds. I suspect that is what happened here. I recommend that you continue to upgrade and disregard these issues for now. If on completion the counts still do not match you will need to force the partitions to resync.

To force partitions to resync issue the following commands.

asadm -h [NODE IP] -e "cluster dun all";
sleep 10;
asadm -h [NODE IP] -e "cluster undun all";

This will cause all partition versions to diverge and resync.

Upvotes: 2

Related Questions