Erlang remote procedure call module internals

Question

I have Several Erlang applications on Node A and they are making rpc calls to Node B onto which i have Mnesia Stored procedures (Database querying functions) and my Mnesia DB as well. Now, occasionally, the number of simultaneous processes making rpc calls to Node B for data can rise to 150. Now, i have several questions:

Question 1: For each rpc call to a remote Node, does Node A make a completely new (say TCP/IP or UDP connection or whatever they use at the transport) CONNECTION? or there is only one connection and all rpc calls share this one (since Node A and Node B are connected [got to do with that epmd process])?

Question 2: If i have data centric applications on one node and i have a centrally managed Mnesia Database on another and these Applications' tables share the same schema which may be replicated, fragmented, indexed e.t.c, which is a better option: to use rpc calls in order to fetch data from Data Nodes to Application nodes or to develope a whole new framework using say TCP/IP (the way Scalaris guys did it for their Failure detector) to cater for network latency problems?

Question 3: Has anyone out there ever tested or bench marked the rpc call efficiency in a way that can answer the following?
(a) What is the maximum number of simultaneous rpc calls an Erlang Node can push onto another without breaking down?
(b) Is there a way of increasing this number, either by a system configuration or operating system setting? (refer to Open Solaris for x86 in your answer)
(c) Is there any other way of applications to request data from Mnesia running on remote Erlang Nodes other than rpc? (say CORBA, REST [requires HTTP end-to-end], Megaco, SOAP e.t.c)

archaelus · Accepted Answer

Mnesia runs over erlang distribution, and in Erlang distribution there is only one tcp/ip connection between any pair of nodes (usually in a fully mesh arrangement, so one connection for every pair of nodes). All rpc/internode communication will happen over this distribution connection.

Additionally, it's guaranteed that message ordering is preserved between any pair of communicating processes over distribution. Ordering between more than two processes is not defined.

Mnesia gives you a lot of options for data placement. If you want your persistent storage on node B, but processing done on node A, you could have disc_only_copies of your tables on B and ram_copies on node A. That way applications on node A can get quick access to data, and you'll still get durable copies on node B.

I'm assuming that the network between A and B is a reliable LAN that is seldom going to partition (otherwise you're going to spend a bunch of time getting mnesia back online after a partition).

If both A and B are running mnesia, then I would let mnesia do all the RPC for me - this is what mnesia is built for and it has a number of optimizations. I wouldn't roll my own RPC or distribution mechanism without a very good reason.

As for benchmarks, this is entirely dependent on your hardware, mnesia schema and network between nodes (as well as your application's data access patterns). No one can give you these benchmarks, you have to run them yourself.

As for other RPC mechanisms for accessing mnesia, I don't think there are any out of the box, but there are many RPC libraries you could use to present the mnesia API to the network with a small amount of effort on your part.

Erlang remote procedure call module internals

Answers (1)

Related Questions