Reputation: 22241
A client has reported an error I fail to comprehend. A TCP based client is connected to the server from which it receives data, rarely sending anything. Usually everything works fine, but once in a blue moon a situation like this occurs:
Here is how the tcp connection is established (stripped of all logs, return checks etc)
ret = inet_pton(AF_INET, conn->address, &addr.sin_addr);
addr.sin_port = htons(conn->port); /* Server port */
addr.sin_family = AF_INET;
sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
connect(sock, (struct sockaddr *) &addr, sizeof(addr));
And here is the read wrapper:
int32_t _readn ( int fd, uint8_t *vptr, int32_t n )
{
int32_t nleft;
int32_t nread;
uint8_t* ptr;
ptr = vptr;
nleft = n;
while (nleft > 0) {
if ((nread = read (fd, ptr, nleft)) < 0) {
if (errno == EINTR) {
nread = 0;
} else {
return E_NETWORK_ERROR;
}
} else if ( nread == 0 ) {
break;
}
nleft -= nread;
ptr += nread;
}
return (n-nleft);
}
Is it possible for the read call to block for ever, even after the connection is closed?
Is there some kind of tricky error in my wrapper that I didn't notice that may cause this? Should I set some flags for the socket on connection?
Upvotes: 1
Views: 3658
Reputation: 22241
I ended up using a select-based function to check if data is available.
While the reason behind the mysterious data loss is still unknown (no server error convirmed), this seems to do the trick:
int32_t isReadDataAvailableOnSocket ( int sock, uint32_t waitTimeUs )
{
fd_set fds;
int16_t ret = 0;
struct timeval timeout;
struct timeval* timeoutPtr = NULL;
if (waitTimeUs>0) {
timeout.tv_sec = waitTimeUs / 1000000;
timeout.tv_usec = waitTimeUs % 1000000;
timeoutPtr = &timeout;
}
FD_ZERO ( &fds );
FD_SET ( sock, &fds );
ret = select ( sock+1, &fds, NULL, NULL, timeoutPtr );
if (ret == -1) {
WARN("select failed for udp socket=[%d]", sock);
return E_NETWORK_ERROR;
}
if ( ! FD_ISSET(sock, &fds) )
{
return E_NO_DATA;
}
else
{
return 0;
}
}
Upvotes: 2
Reputation: 35613
The source of the problem is if there is no data to read the read will block. E.g. if there are fewer than the expected n bytes written. This is known as a blocking read.
To discover whether there is data, use select
as Jite says.
Finally you may have a firewall dropping a live connection. Some firewalls are configured to cut connections which have been open for longer than a given time, e.g. 30 minutes. Probably this is not what you have however.
Upvotes: 4