Problem receiving multicast traffic from several groups on one socket

Question

I am working on an application in C that listens to several multicast groups on one single socket. I am disabling the socket option: IP_MULTICAST_ALL. The socket is receiving traffic from 20 different multicast groups. This traffic arrives in bursts to the socket. One of those channels only publishes one message per second and no problem has been observed here.

I have also a reliable protocol for this multicasts feeds. If one listener misses a message then the protocol tries to recover the message by talking with the source via messages, then a retransmission is performed via the same channel as usual.

The problem appears when there are message bursts that arrived to the socket, then the RUDP protocol forces the retransmission of those messages. The messages arrive without problem but if the message-burst groups stop to retransmit new data because they don't have any more traffic to send, sometimes (it is pretty easy to reproduce it) the socket does not read those pending incoming messages from these groups if a periodic message arrives from a different group (the one that has tiny traffic, tiny and periodic traffic). The situation up-to here is that there are many incoming messages sent before, pending to be read by the application (no more data is sent via this groups), periodic messages that arrive from the other group that sends periodically a few messages.

What I have seen here is that application reads a message from the group that periodically sends a few messages and then a batch of messages from other groups (the burst groups). The socket is configured as non-blocking and I get the EAGAIN errno everytime a batch of messages is read from the socket, then there is no more data to read until the socket gets a new message from the periodic group, then this message is read and a batch of the other pending messages from the other groups (the application is only reading from one single socket). I made sure the other groups do not produce more data because I tested stopping the other processes to send more data. So all the pending messages on these groups are already been sent.

The most surprising fact is that if I prevent the process that writes to the periodic group to send more messages, then the listener socket gets magically all the pending traffic from the groups that published a burst of messages before. It is like if the traffic of the periodic group stops somehow the processing of the traffic from the groups that do not publish new data but the buffers are plenty of it.

At first I thought it was related with IGMP or the poll mechanism (my application can perform active waiting or blocking waiting). The blocking waiting is implemented with the non-blocking socket but if the errno is set to EAGAIN then the app waits on a poll for new messages. I get the same behavior in both situations. I don't think it is the IGMP because the IGMP_SNOOPING is off in the switches and because I reproduce the same behavior using one computer loopback for all this communications betweeen processes.

I also reproduce this behavior using kernel-bypass technologies (not using the kernel API to deal with the network), so it does not seem related to TCP/IP stack. Using kernel-bypass technologies I have the same paradigm: one message interface that gets all the traffic from all the groups. In this scenario all the processes use this mechanism to communicate, not several TCP/IP and several kernel-bypass. The model is homogeneous.

How could it be that I only receive batches of messages (but not all) when I am receiving live traffic from several groups but then I receive all the pending traffic if I stop the periodic traffic that arrives from a different multicast group?. This periodic traffic group it is only one message per second. The burst group does not publish anymore as all the messages were already published.

Please, does someone have an idea what should I check next?

Problem receiving multicast traffic from several groups on one socket

Answers (0)

Related Questions