Reputation: 1

Integrate thrift implementation of a distributed data system (client, servers) with Raft protocol

So, first of all, sorry for my English. Im not a native speaker.

The question is.. I already have a implementation of a Cliente-Server application with distributed data (3 servers) using Thrift. Now (the last phase of the project) is to use some implementation of Raft (as Im using Java, an option is copycat) to replicate each server. But Thrift create the servers and cliente in his way (Something like Grafosd.Client client = new...) and Grafosd is generated by Thrift. Also, Thrift store the data in the Handler?. And copycat create the server and client in a different way (Something like CopycatClient client = builder.build();). and the data is stored in a StateMachine?.

So Im having difficults to integrate both. Someone already have used Thrift Client-Server with some implementation of Raft protocol? (not necessary copycat, it can be any implementation of Raft in Java).

Upvotes: 0

Answers (2)

JensG

Reputation: 13421

Some more general remarks to your question from my side:

But Thrift create the servers and cliente in his way (Something like Grafosd.Client client = new...) and Grafosd is generated by Thrift.

Thrift itself is (only) the serialization and RPC mechanism that is used. More complicated protocols or APIs are usually designed on top of Thrift, using Thrift - but not inside Thrift. It's like using a car to transport material to a building site. It is not the car that determines the architecture. The car is only the means to get the job done.

In that regard, Thrift (or any other similar mechanism) is only a tool in that context. I would suggest to first make it mentally clear, which piece of the puzzle belongs where to get the best out of the design of your system.

Also, Thrift store the data in the Handler?

I'd recommend to always make handlers stateless. If you need a state, thats's fine, but store it somewhere else. Thrift itself stores nothing. It's the handler implementation which is in the hands of the server-side developer that may need to store state or other information.

Upvotes: 1

kuujo

Reputation: 8195

First you have to ask yourself why is the second phase of your project to use a consensus algorithm? Does the project require strong consistency? Have you considered alternative replication protocols (gossip, primary-backup, etc)

Regardless of which Raft implementation you use, the way most implementations model state within the system is as a state machine. Changes to the state of the system must go through the Raft protocol to the leader and be replicated to followers, and queries on the state of the system must go through the protocol as well if you want to preserve consistency/fault tolerance guarantees.

If you want to embed Copycat within a server, just use a LocalTransport Which allows you to communicate with an in-process server. The CopycatServer doesn't have to run on a remote machine. It's perfectly realistic and rational to embed the Copycat client and server within your Thrift server. Within your Thrift server, create a CopycatServer that contains a state machine that can represent changes to the state of your system, and a CopycatClient that uses a LocalTransport to communicate with the local server.

You might also consider looking at using Atomix in which AtomixReplica handles this local client/server embedding pattern for you. It also includes a plethora of example state machines and client APIs.

But as I said, regardless of whether you use Copycat/Atomix or another Raft implementation, you'll still have to model state changes in the same way. Each change to the state of the system must be submitted to a Raft leader where it's logged and replicated to followers and applied to a state machine. The state machine replication model is well suited to stateful systems. An alternative for systems that are storing large amounts of state or need to store state in an external database is persistent state machines. I find that this is what many users are looking for in Raft. But you have to be careful with how persistent state machines are implementated within a Raft cluster, otherwise you'll risk duplicating writes.

Still, you should first determine whether a complex protocol like Raft is necessary for the problem you're trying to solve. First answer what that problem is, and what it requires from a replication protocol. Do you need partition tolerance? Do you need strong consistency? Do you need high availability? Do throughput requirements preclude using a leader-based protocol? Why not just write to any external database that's replicated?

I am the author of Copycat and Atomix. Feel free to join us on chat when you answer some of those questions and determine whether this is indeed the right direction to go.

Upvotes: 2

Integrate thrift implementation of a distributed data system (client, servers) with Raft protocol

Answers (2)

Related Questions