TinkerTank
TinkerTank

Reputation: 5805

Distributed MinIO failure scenario's

We want to run MinIO in a distributed / high-availability setup, but would like to know a bit more about the behavior of MinIO under different failure scenario's. Especially given the read-after-write consistency, I'm assuming that nodes need to communicate.

So what happens if a node drops out? Will there be a timeout from other nodes, during which writes won't be acknowledged? What happens during network partitions (I'm guessing the partition that has quorum will keep functioning), or flapping or congested network connections? What if a disk on one of the nodes starts going wonky, and will hang for 10s of seconds at a time? Will the network pause and wait for that?

Is there any documentation on how MinIO handles failures?

Upvotes: 2

Views: 3518

Answers (2)

Gaga Samushia
Gaga Samushia

Reputation: 157

Minio has Healthcheck Probe

  • Liveness probe available at /minio/health/live
  • Readiness probe available at /minio/health/ready

In distributed minio environment you can use reverse proxy service in front of your minio nodes. For example Caddy proxy, that supports the health check of each backend node.

Here is the examlpe of caddy proxy configuration I am using.

if you want tls termiantion /etc/caddy/Caddyfile looks like this

 http://yuourminio.examlple.com:80 {
  redir https://{host}{uri}
}

yuourminio.examlple.com:443 {
tls /etc/ssl/certs/certbundle.pem  /etc/ssl/certs/private.key
proxy / minio01-ip:9000 minio02-ip:9000 minio03-ip:9000 minio04-ip:9000 {
    header_upstream X-Forwarded-Proto {scheme}
    header_upstream X-Forwarded-Host {host}
    header_upstream Host {host}
    health_check /minio/health/ready
    health_check_interval 3s
}

without tls

     yuourminio.examlple.com:80 { 
     proxy / minio01-ip:9000 minio02-ip:9000 minio03-ip:9000 minio04-ip:9000 {
        header_upstream X-Forwarded-Proto {scheme}
        header_upstream X-Forwarded-Host {host}
        header_upstream Host {host}
        health_check /minio/health/ready
        health_check_interval 3s 
     }

Minio node also can send metrics to prometheus, so you can build grafana deshboard and monitor Minio Cluster nodes. (minio disks, cpu, memory, network)

for more please check docs:
https://docs.min.io/docs/minio-monitoring-guide.html

https://docs.min.io/docs/setup-caddy-proxy-with-minio.html

Upvotes: 1

TinkerTank
TinkerTank

Reputation: 5805

Since MinIO promises read-after-write consistency, I was wondering about behavior in case of various failure modes of the underlaying nodes or network.

The MinIO documentation (https://docs.min.io/docs/distributed-minio-quickstart-guide.html) does a good job explaining how to set it up and how to keep data safe, but there's nothing on how the cluster will behave when nodes are down or (especially) on a flapping / slow network connection, having disks causing I/O timeouts, etc.

This issue (https://github.com/minio/minio/issues/3536) pointed out that MinIO uses https://github.com/minio/dsync internally for distributed locks.

From the documentation:

A node will succeed in getting the lock if n/2 + 1 nodes (whether or not including itself) respond positively.

I think this is a pretty nice model:

  • Reads will succeed as long as n/2 nodes and disks are available.
  • To perform writes and modifications, nodes wait until they receive confirmation from at-least-one-more-than half (n/2+1) the nodes.
  • There's no real node-up tracking / voting / master election or any of that sort of complexity. Nodes are pretty much independent.
  • If we have enough nodes, a node that's down won't have much effect.
  • Even a slow / flaky node won't affect the rest of the cluster much; It won't be amongst the first half+1 of the nodes to answer to a lock, but nobody will wait for it.
  • For unequal network partitions, the largest partition will keep on functioning.
  • For exactly equal network partition for an even number of nodes, writes could stop working entirely.

Please note that, if we're connecting clients to a MinIO node directly, MinIO doesn't in itself provide any protection for that node being down. We still need some sort of HTTP load-balancing front-end for a HA setup. (which might be nice for asterisk / authentication anyway.)

Note 2; This is a bit of guesswork based on documentation of MinIO and dsync, and notes on issues and slack. If haven't actually tested these failure scenario's, which is something you should definitely do if you want to run this in production.

Upvotes: 3

Related Questions