Reputation: 415
I have a setup where a master process is setting up a ZMQ_ROUTER
and then forks many child processes, which then connect to that router.
Whenever a child zmq_connect()
's to the master, one file descriptor is occupied.
This however limits the number of interacting processes to the number of allowed file descriptors ( per process ). For me ( linux ), this currently is just 1024.
That is way too small for my intended use ( a multi-agent / swarm simulation ).
Answer:
You can't, except when using an inter-thread socket type ( using an inproc://
transport-class ). All other protocols use one file descriptor per connection.
One new approach to reduce the number of necessary file descriptors per application, if that application has several services ( e.g. several tcp://<address:port>
connections can be made to ), seems to be to use the resource property resource property, which allows one to combine several services to one endpoint.
Upvotes: 1
Views: 646
Reputation: 1
First of all, a smart solution of the massive herd of agents requires both flexibility ( in swarm framework design for features' additions ) and efficiency ( for both scalability and speed ) so as to achieve fastest possible simulation run-times, in spite of PTIME
& PSPACE
obstacles, with possible risk of wandering into EXPTIME
zone in more complex inter-agent communication schemes.
At the first moment, my guess was to rather use and customise a bit light-weight-er POSIX-based signalling/messaging framework nanomsg
-- a younger sister of ZeroMQ
from Martin SUSTRIK, co-father of ZeroMQ -- where a Context()
-less design plus additional features alike SURVEY
and BUS
messaging archetype are of particular attractivity for swarms with your own software-designed problem-domain-specific messaging/signalling protocols:
file_descriptor
sWell, you need a courage. Doable, but sleeves up, it will require hands on efforts in kernel, tuning in system settings and you will pay for having such scale by increased overheads.
Other factors to consider are that while some software may use
sysconf(OPEN_MAX)
to dynamically determine the number of files that may be open by a process, a lot of software still uses theC
library's defaultFD_SETSIZE
, which is typically 1024 descriptors and as such can never have more than that many files open regardless of any administratively defined higher limit.
While still "technically" doable, there are further limits -- even nanomsg
will not be able to push more than about 1.000.000 [MSGs/s]
which is fairly well enough for most applications, that cannot keep the pace with this native speed of message-dispatch. Citations state some ~6 [us]
for CPU-core to CPU-core transfer latencies and if the user-designed swarm-processing application cannot make the sending loop under some 3-4 [us]
the performance ceiling is not anywhere close to cause an issue.
A distributed multi-host processing is the first dimension to attack the static scale of the swarm. Next would be a need to introduce an RDMA-injection so as to escape from the performance bottleneck of any stack-processing in the implementation of the distributed messaging / signalling. Yes, this can move your Swarm system into nanosecond-scale latencies zone, but at the cost of building an HPC / high-tech computing infrastructure ( which would be a great Project, if your Project sponsor can adjust financing of such undertaking -- + pls. pls. do let me know if yes, would be more than keen to join such swarm intelligence HPC-lab ), but worth to know about this implication before deciding on architecture and knowing the ultimate limits is the key to do it well from the very beginning.
Upvotes: 3