Daniel Cukier
Daniel Cukier

Reputation: 11942

Aerospike missing data when adding new node to cluster

I have a Aerospike (3.11.1.1) cluster with 6 nodes. When I try to add a new node, sometimes some objects are "temporarily" lost while the cluster is migrating data. After the migration finishes, the missing data return. Is this a BUG or am I doing something wrong? How to avoid

Notices that while migration happens, the master object count is lower then the actual final object count after migration finishes

Master and replica count before finishing migrations:

Master and replica count before finishing migrations

Master and replica count after finishing migrations:

Master and replica count after finishing migrations

My aerospike.conf:

service {
    user root
    group root
    paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        paxos-recovery-policy auto-reset-master
    pidfile /var/run/aerospike/asd.pid
    service-threads 32
    transaction-queues 32
    transaction-threads-per-queue 4
        batch-index-threads 40
    proto-fd-max 15000
        batch-max-requests 30000
        replication-fire-and-forget true
}

logging {
    # Log file must be an absolute path.
    file /var/log/aerospike/aerospike.log {
        context any info
    }
}

network {
    service {
        #address any
        port 3000
    }

    heartbeat {
                mode mesh
                mesh-seed-address-port 10.240.0.32 3002
                mesh-seed-address-port 10.240.0.33 3002
                port 3002

        interval 150
        timeout 20
    }

    fabric {
        port 3001
    }

    info {
        port 3003
    }
}

namespace mynamespace {
    replication-factor 2
    memory-size 1500M
    default-ttl 0 # 30 days, use 0 to never expire/evict.
        ldt-enabled true
        write-commit-level-override master

    storage-engine device {
          file /data/aerospike.dat
          #device /dev/sdb
          write-block-size 1M
          filesize 280G
        }
}

Upvotes: 0

Views: 443

Answers (1)

kporter
kporter

Reputation: 2768

Some of the discrepancy was due to an issue in the original migration/rebalance design and is addressed in the protocol change in Aerospike 3.13. Prior to the protocol change in 3.13, when running replication-factor 2, the operator must upgrade one node at a time and wait for migrations to complete afterwards.

Additional discrepancy is Aerospike avoiding over counting master-objects and replica objects (i.e. prole-objects) during migration. Also with 3.13 we added a stat for the non-replica-objects which are objects that are not currently acting as master or replica. These are either (a) objects on a partition that has inbound migrations and will eventually act as replica or (b) these are objects on a partition that will not participate and will be dropped when migrations terminate for the partition.

Prior to 3.13, non-replica-object of type (a) would reduce the counts for both master-objects or prole-objects. This is because prior to the protocol change, when a partition returns that was previously master, it immediately resumes as master even though it doesn't have the new writes that took place while it was away. This isn't optimal behavior but it isn't losing data since we will resolve the missing records from the non-replica-objects on other nodes. Post protocol change, a returning 'master' partition will not resume as 'master' until it has received all migrations from other nodes.

Prior to 3.13, non-replica-objects of type (b) would immediately drop and would reduce the count for prole-objects. This causes the replication-factor of records written while a node was away to be reduced by one (e.g. replication-factor 2 temporarily becomes replication-factor 1). This is also the reason it was important to wait for migrations to complete before proceeding to upgrade the next node. Post protocol change (unless running in-memory only), it is no longer necessary to wait for migrations to complete between node upgrades because the interim 'subset partitions' aren't dropped which prevents record's replication-factor from being reduced (actually, with the new protocol, during migrations there are often replication-factor + 1 copies of a record).

Upvotes: 2

Related Questions