alex
alex

Reputation: 415

Can I prevent ZeroMQ from occupying file descriptors?

I have a setup where a master process is setting up a ZMQ_ROUTER and then forks many child processes, which then connect to that router.

Whenever a child zmq_connect()'s to the master, one file descriptor is occupied.

This however limits the number of interacting processes to the number of allowed file descriptors ( per process ). For me ( linux ), this currently is just 1024.

That is way too small for my intended use ( a multi-agent / swarm simulation ).


Answer:

You can't, except when using an inter-thread socket type ( using an inproc:// transport-class ). All other protocols use one file descriptor per connection.

One new approach to reduce the number of necessary file descriptors per application, if that application has several services ( e.g. several tcp://<address:port> connections can be made to ), seems to be to use the resource property resource property, which allows one to combine several services to one endpoint.

Upvotes: 1

Views: 646

Answers (1)

user3666197
user3666197

Reputation: 1

The Swarm first:

First of all, a smart solution of the massive herd of agents requires both flexibility ( in swarm framework design for features' additions ) and efficiency ( for both scalability and speed ) so as to achieve fastest possible simulation run-times, in spite of PTIME & PSPACE obstacles, with possible risk of wandering into EXPTIME zone in more complex inter-agent communication schemes.


Efficiency next:

At the first moment, my guess was to rather use and customise a bit light-weight-er POSIX-based signalling/messaging framework nanomsg -- a younger sister of ZeroMQ from Martin SUSTRIK, co-father of ZeroMQ -- where a Context()-less design plus additional features alike SURVEY and BUS messaging archetype are of particular attractivity for swarms with your own software-designed problem-domain-specific messaging/signalling protocols: enter image description here


100k+ with file_descriptors

Well, you need a courage. Doable, but sleeves up, it will require hands on efforts in kernel, tuning in system settings and you will pay for having such scale by increased overheads.

Andrew Hacking has explained both the PROS and CONS of the "just" increasing fd count ( not only on the kernel side of the system tuning and configuration ).

Other factors to consider are that while some software may use sysconf(OPEN_MAX) to dynamically determine the number of files that may be open by a process, a lot of software still uses the C library's default FD_SETSIZE, which is typically 1024 descriptors and as such can never have more than that many files open regardless of any administratively defined higher limit.

Andrew has also directed your kind attention to this, that may serve as an ultimate report on how to setup a system for 100k-200k connections.


Do static scales above 100k per host make any real sense for swarm simulations?

While still "technically" doable, there are further limits -- even nanomsg will not be able to push more than about 1.000.000 [MSGs/s] which is fairly well enough for most applications, that cannot keep the pace with this native speed of message-dispatch. Citations state some ~6 [us] for CPU-core to CPU-core transfer latencies and if the user-designed swarm-processing application cannot make the sending loop under some 3-4 [us] the performance ceiling is not anywhere close to cause an issue.


How to scale above that?

A distributed multi-host processing is the first dimension to attack the static scale of the swarm. Next would be a need to introduce an RDMA-injection so as to escape from the performance bottleneck of any stack-processing in the implementation of the distributed messaging / signalling. Yes, this can move your Swarm system into nanosecond-scale latencies zone, but at the cost of building an HPC / high-tech computing infrastructure ( which would be a great Project, if your Project sponsor can adjust financing of such undertaking -- + pls. pls. do let me know if yes, would be more than keen to join such swarm intelligence HPC-lab ), but worth to know about this implication before deciding on architecture and knowing the ultimate limits is the key to do it well from the very beginning.

Upvotes: 3

Related Questions