Reputation: 11942
I have a Aerospike (3.11.1.1) cluster with 6 nodes. When I try to add a new node, sometimes some objects are "temporarily" lost while the cluster is migrating data. After the migration finishes, the missing data return. Is this a BUG or am I doing something wrong? How to avoid
Notices that while migration happens, the master object count is lower then the actual final object count after migration finishes
Master and replica count before finishing migrations:
Master and replica count after finishing migrations:
My aerospike.conf:
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
paxos-recovery-policy auto-reset-master
pidfile /var/run/aerospike/asd.pid
service-threads 32
transaction-queues 32
transaction-threads-per-queue 4
batch-index-threads 40
proto-fd-max 15000
batch-max-requests 30000
replication-fire-and-forget true
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
#address any
port 3000
}
heartbeat {
mode mesh
mesh-seed-address-port 10.240.0.32 3002
mesh-seed-address-port 10.240.0.33 3002
port 3002
interval 150
timeout 20
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace mynamespace {
replication-factor 2
memory-size 1500M
default-ttl 0 # 30 days, use 0 to never expire/evict.
ldt-enabled true
write-commit-level-override master
storage-engine device {
file /data/aerospike.dat
#device /dev/sdb
write-block-size 1M
filesize 280G
}
}
Upvotes: 0
Views: 443
Reputation: 2768
Some of the discrepancy was due to an issue in the original migration/rebalance design and is addressed in the protocol change in Aerospike 3.13. Prior to the protocol change in 3.13, when running replication-factor
2, the operator must upgrade one node at a time and wait for migrations to complete afterwards.
Additional discrepancy is Aerospike avoiding over counting master-objects
and replica objects (i.e. prole-objects
) during migration. Also with 3.13 we added a stat for the non-replica-objects
which are objects that are not currently acting as master or replica. These are either (a) objects on a partition that has inbound migrations and will eventually act as replica or (b) these are objects on a partition that will not participate and will be dropped when migrations terminate for the partition.
Prior to 3.13, non-replica-object
of type (a) would reduce the counts for both master-objects
or prole-objects
. This is because prior to the protocol change, when a partition returns that was previously master, it immediately resumes as master even though it doesn't have the new writes that took place while it was away. This isn't optimal behavior but it isn't losing data since we will resolve the missing records from the non-replica-objects
on other nodes. Post protocol change, a returning 'master' partition will not resume as 'master' until it has received all migrations from other nodes.
Prior to 3.13, non-replica-objects
of type (b) would immediately drop and would reduce the count for prole-objects
. This causes the replication-factor
of records written while a node was away to be reduced by one (e.g. replication-factor 2
temporarily becomes replication-factor 1
). This is also the reason it was important to wait for migrations to complete before proceeding to upgrade the next node. Post protocol change (unless running in-memory only), it is no longer necessary to wait for migrations to complete between node upgrades because the interim 'subset partitions' aren't dropped which prevents record's replication-factor
from being reduced (actually, with the new protocol, during migrations there are often replication-factor
+ 1 copies of a record).
Upvotes: 2