Reputation: 137
I have a ZeroMQ ROUTER
/DEALER
pair of Formal Communication Archetypes, used for an asynchronous communication between two computers.
If the computer with the DEALER
socket goes off-line for a while and comes back, some messages are lost.
I do understand that ZeroMQ can't hold the messages indefinitely as there is no guarantee the DEALER
-side is ever coming back. I am looking for ways to configure this behavior - is there a setting I can use to control how long the messages are kept before giving up?
What settings might affect this behavior?
I don't think the issue is related to a value of the High Water Mark setting, as the amount of the data transferred is quite low.
ZeroMQ version 4.0.4 on Windows.
I'm not sure what exactly would be the relevant parts of code to show. Also, it is not entirely straight forward to lift things out of context and keep them understandable. I'll try, so here goes.
This is how the router socket is initialized:
router-socket (doto (zmq/socket zmq-context :router)
(zmq/set-receive-timeout 100)
(zmq/set-recv-hwm 0)
(zmq/set-send-hwm 0)
(zmq/bind (str "tcp://*:" (:port request-handler))))
Sending the messages uses zmq/send function.
Even with no clojure experience the zeromq parts should be clear.
Here's how the dealer is initialized (C#):
var dealer = context.CreateSocket( SocketType.DEALER );
dealer.SendHighWatermark = 0;
dealer.ReceiveHighWatermark = 0;
Reception of the messages uses the ZmqSocket.ReceiveMessage method (well, actually it's an extension method in the SendReceiveExtensions class, but anyway).
One (possibly the main) case where the package loss occurs is when the computer running the dealer goes to sleep (=laptop lid closed) and is woken up some minutes afterwards. Like I stated in my original question, I am assuming that the packages are lost due to the router temporarily giving up on the dealer coming back and therefore discarding the messages. But this is only an assumption, the cause may be something else, too.
Upvotes: 2
Views: 2103
Reputation: 1
So, let's start with polishing touches on ZeroMQ Context()
-engine settings, which is the ultimate authority for running all the low-level stuff of the smart-signaling / messaging in ZeroMQ, using the powers of the .setsockopt()
method.
For a real-world troubleshooting, there is nothing like a one-size-fits-all, so without any code above, there are many things that may be approached by guesstimate and pieces of experience from past troubles, one has met.
While some of the root-cause things may actually get masked by some other habits of the ZeroMQ processes, running under the hood, the following text is more about an as broad view as possible onto the art of balancing acts, than a step-by-step navigation.
From just a few remarks above, would start with these suspects, from the long list of smart API options:
ZMQ_RECONNECT_IVL
: Set reconnection intervalThe
ZMQ_RECONNECT_IVL
option shall set the initial reconnection interval for the specified socket. The reconnection interval is the period ØMQ shall wait between attempts to reconnect disconnected peers when using connection-oriented transports.
Going shorter from ~ 100 [ms]
to some 2 [ms]
with adjusted ZMQ_RECONNECT_IVL_MAX
to a few multiples thereof, may together with below mentioned strategies for surviving spurious LoS and similar service dropouts help in a reduced overhead latency during renewing the lost low-level connections.
Ref. also to ZMQ_TCP_MAXRT
and O/S overrides ( available where O/S supported ) via ZMQ_TCP_KEEPALIVE_{CNT | IDLE | INTVL}
.
This one will highlight the states, when peers are not connected, so that the message delivery strategy might get adjusted for such observed cases in the user-application code:
ZMQ_IMMEDIATE
: Queue messages only to completed connectionsBy default queues will fill on outgoing connections even if the connection has not completed. This can lead to "lost" messages on sockets with round-robin routing (
REQ
,PUSH
,DEALER
). If this option is set to 1, messages shall be queued only to completed connections. This will cause the socket to block if there are no other connections, but will prevent queues from filling on pipes awaiting connection.
With a similar strategy, as was above posted for the DEALER
-side, ROUTER
side may use the ZMQ_PROBE_ROUTER
setting, so as to bootstrap connections to ROUTER
sockets
If principally possible and if cost-wise still reasonable, one may use sort of "service" sonar-beeps, injected in regular intervals:
ZMQ_HEARTBEAT_IVL
: Set interval between sending ZMTP heartbeatsThe
ZMQ_HEARTBEAT_IVL
option shall set the interval between sending ZMTP heartbeats for the specified socket. If this option is set and is greater than 0, then a PING ZMTP command will be sent everyZMQ_HEARTBEAT_IVL
milliseconds.
ZMQ_HEARTBEAT_TIMEOUT:
Set timeout for ZMTP heartbeatsThe
ZMQ_HEARTBEAT_TIMEOUT
option shall set how long to wait before timing-out a connection after sending a PING ZMTP command and not receiving any traffic. This option is only valid ifZMQ_HEARTBEAT_IVL
is also set, and is greater than 0. The connection will time out if there is no traffic received after sending the PING command, but the received traffic does not have to be a PONG command - any received traffic will cancel the timeout.
ZMQ_HEARTBEAT_TTL
: Set the TTL value for ZMTP heartbeatsThe
ZMQ_HEARTBEAT_TTL
option shall set the timeout on the remote peer for ZMTP heartbeats. If this option is greater than 0, the remote side shall time out the connection if it does not receive any more traffic within the TTL period. This option does not have any effect ifZMQ_HEARTBEAT_IVL
is not set or is 0. Internally, this value is rounded down to the nearest decisecond, any value less than 100 will have no effect.
ZMQ_CONNECT_TIMEOUT
: Setconnect()
timeoutSets how long to wait before timing-out a
connect()
system call. Theconnect()
system call normally takes a long time before it returns a time out error. Setting this option allows the library to time out the call at an earlier interval.
Setting just a few [ms]
may help demask intermittent interruptions and/or open new service windows in between them. For allowing this short-window strategy work, one ought also reduce the maximum value permitted by the ZMQ_HANDSHAKE_IVL
.
ZMQ_BACKLOG
: Set maximum length of the queue of outstanding connectionsThe
ZMQ_BACKLOG
option shall set the maximum length of the queue of outstanding peer connections for the specified socket; this only applies to connection-oriented transports. For details refer to your operating system documentation for thelisten
function.
A hundred here may serve well enough, but without details about number of "lost" connections, let's keep it on the troubleshooters' shopping list.
Upvotes: 1