k-kawa
k-kawa

Reputation: 1417

How good are ZooKeeper and Etcd?

Disclaimer: I'm quite new for the etcd project and ZooKeeper project.

I'm recently getting interested in the distributed open source products. I found they seems to require configuration(coordination?) systems such as ZooKeeper for Presto DB, Hive and Etcd for kubernetes and I think that understanding the role of etcd and ZooKeeper is the first step to understand the distributed systems.

But now, I feel like getting lost... I could not yet understand what is the good and unique points of the etcd and ZooKeeper. They look for me a well-distributed key-value storage or file systems. Here is the impression that I have for the products. I know the impressions don't reflect the feature of the products. but I don't know what is the remaining feature that I should know.

ZooKeeper: According to the overview page of ZooKeeper, it guarantees the following things.

The sequential consistency and atomicity are the unique features which is not supported by most file systems but others are common among other file systems.

Etcd: According to the README of etcd. it focuses on

Most of them seems common with Amazon S3 (S3 doesn't support such a fast access.)

I know those products are very good ones because most of the distributed open source products depend on them. but what is the key, unique feature that the distributed open source product choose them?

Upvotes: 11

Views: 5668

Answers (2)

IronWidget
IronWidget

Reputation: 88

Short answer: they are consistent and highly available (CA from the CAP theorem) databases that offer distributed leader election, locking, and other concurrency primitives via total order broadcast. But in exchange, the strong CA guarantees are overkill for most other operations; 1000 writes/second/instance is fast for ZooKeeper but horrible for Redis/PostgreSQL/Mongo. Most distributed systems use ZK or Etcd for high availability features and then use systems with weaker guarantees like Redis for bulk storage where corners can be cut to boost performance.

For some background: a common challenge in distributed systems is coordination, especially guaranteeing consistency. For example a frequent situation is only one thread on the cluster at a time should be performing a privileged action like accepting writes to a shard or moving data. If not then a "split brain" scenario can arise where nodes in the cluster disagree on the current value, or even worse, data loss that can arise if multiple nodes try conflicting operations on the same data at the same time. On a single node and a single process we can coordinate threads using concurrency primitives like locks. But in a networked setup with many processes spanning several nodes we don't have these concurrency primitives - unless we deploy ZooKeeper or Etcd!

More theoretical answer: ZooKeeper and Etcd are implementations of a distributed primitive called total order broadcast. It allows a group of nodes to all agree on one total ordering of messages in a channel. Martin Kleppmann in his excellent book Designing Data Intensive Applications explains this in more detail and shows how you can use total order broadcast to create distributed locks and other concurrency primitives. These locks then allow nodes in the cluster to elect leaders and coordinate in a way all nodes agree with the results. As long as more than half the nodes are functioning normally a ZK/Etcd quorum will be functioning and available. Losing half or more of the nodes will stall the quorum and prevent new messages from being broadcast until over half the nodes are alive again.

All the above are the pros of using ZK and Etcd. A major con is total order broadcast is overkill for most other use cases. Systems with weaker guarantees can cut corners to offer dramatically higher performance. For example:

  • if you want maximum speed from an in-memory database and can tolerate some data loss and stale reads, consider Redis
  • if you need strong consistency guarantees but are willing to accept downtime if fewer nodes fail at the same time, consider SQL databases

Another downside of ZooKeeper (and Etcd I assume, but I don't have hands-on experience with it) is they're hard to use correctly. For example, the ZK lock implementation in Apache Curator requires ZK's monotonically increasing suffixes feature, and watch flags, and tricks for avoiding one-to-all notifications, and checking for edge cases. Fortunately the Curator project has implemented the most common concurrency primitives already so users don't have to come face to face with the hidden complexities of ZK. For this reason the vast majority of people use ZK indirectly through another product like Curator or Spark or Kubernetes instead of wrangling with it directly.

Upvotes: 1

kuujo
kuujo

Reputation: 8185

I think you're confusing the file-system-like interface with an actual file system. The systems you are mentioning are well suited for cluster coordination, in particular ZooKeeper. What they are not designed for is storing large amounts of data like a file system would. You should think of them more as suited for coordinating a file system. That is, one could imagine a file system storing paths to files in a consistent store like ZooKeeper or etcd, but not the files themselves. That they expose a file system-like interface does not correlate to any ability to store files. Indeed, these systems are designed to store small amounts of data that can be held in memory. By using a consistent store like ZooKeeper for storing file information in a distributed file system, the file system would ensure that clients see changes in the file system in sequential order.

ZooKeeper is really a set of primitives with which distributed systems can be coordinated. Particularly relevant to coordinating distributed systems with ZooKeeper are its session events (watches) which allow clients to listen for changes to the cluster state. Distributed systems typically use watches in ZooKeeper for things like locks, and the strong consistency guarantees of ZooKeeper make it perfectly suitable for that use case.

If you want a good idea of what systems like ZooKeeper and etcd are used for, you should check out the Apache Curator recipes. Atomix also implements similar types of APIs for coordinating distributed systems on top of a consensus algorithm. All of these tools are demonstrative of typical use cases for consensus-based distributed systems.

What's important to note is that these types of systems are built on top of consensus algorithms and usually store state in memory. They're suitable for operations that involve a small amount of data but require a high level of consistency, and that's why they're frequently used for things like distributed locking, configuration management, and group membership.

Upvotes: 13

Related Questions