munish
munish

Reputation: 485

Zookeeper reads is not fully consistent as per documentation, but is creating a znode fully consistent?

Below are my assumptions/queries. Please address if there is something wrong in my understanding

By Reading the documentation I understood that

  1. Zookeeper writes go to the Leader, and they are replicated to follower. A read request can be served from the follower(slave) itself. And hence read can be stale.
  2. Why can't we use zookeeper as a cache system?
  3. As the write request is always made/redirected to Leader, it means node creation is consistent. When two clients sending a write request for same node name, one of them will ALWAYS get an error(NodeExistsException).
  4. If above is true, then can we use zookeeper to keep track of duplicate requests by creating a znode with the requestId.
  5. For generating a sequence number in a distributed system, we can use the sequential node creation.

Upvotes: 1

Views: 577

Answers (1)

inquisitive
inquisitive

Reputation: 3629

Based on what information is available in the question and the comments, it appears that the basic question is: In a stateless multi server architecture, how best to prevent data duplication, here the data is "has this refund been processed?"

This qualifies as "primarily opinion based". There are multiple ways to do this and no one way is the best. You can do it with MySQL and you can do it with Zookeeper.

Now comes pure opinion and speculation:

To process a refund, there must be some database somewhere? Why not just check against it? The duplicate-request scenario that you are preparing against seems like a rare occurrence - this wont be happening hundred times per sec. If so, then this scenario does not warrant high performance implementation. Just a database lookup should be fine.

Your workload seems to be 1:1 ratio of read:write. Every time a refund is processed, you check whether it is already processed or not and if not processed then process it and make an entry for it. Now Zookeeper itself says it works best for something like 10:1 ratio of read:write. While there is no such metric available for MySQL, it does not need to make certain* guarantees that zookeeper makes for write activities. Hence i hope, it should be better for pure write intensive loads. (* Guarantees like sequentiality, broadcast, consensus etc)

Just a nitpick, but your data is a linear list of hundreds (thousands? millions?) of transaction ids. This is exactly what MySQL (or any database) and its Primary Key is built for. Zookeeper is made for more complex/powerful hierarchical data. That you do not need.

Upvotes: 2

Related Questions