Dariusz
Dariusz

Reputation: 22241

how can a tcp socket read call never return

A client has reported an error I fail to comprehend. A TCP based client is connected to the server from which it receives data, rarely sending anything. Usually everything works fine, but once in a blue moon a situation like this occurs:

Here is how the tcp connection is established (stripped of all logs, return checks etc)

ret = inet_pton(AF_INET, conn->address, &addr.sin_addr);
addr.sin_port        = htons(conn->port); /* Server port */
addr.sin_family      = AF_INET;
sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
connect(sock, (struct sockaddr *) &addr, sizeof(addr));

And here is the read wrapper:

int32_t _readn ( int fd, uint8_t *vptr, int32_t n )
{
  int32_t  nleft;
  int32_t  nread;
  uint8_t*     ptr;

  ptr = vptr;
  nleft = n;
  while (nleft > 0) {
    if ((nread = read (fd, ptr, nleft)) < 0) {
      if (errno == EINTR) {
        nread = 0;
      } else {
        return E_NETWORK_ERROR;
      }
    } else if ( nread == 0 ) {
      break;
    }
    nleft -= nread;
    ptr   += nread;
  }
  return  (n-nleft);
}

Is it possible for the read call to block for ever, even after the connection is closed?

Is there some kind of tricky error in my wrapper that I didn't notice that may cause this? Should I set some flags for the socket on connection?

Upvotes: 1

Views: 3658

Answers (2)

Dariusz
Dariusz

Reputation: 22241

I ended up using a select-based function to check if data is available.

While the reason behind the mysterious data loss is still unknown (no server error convirmed), this seems to do the trick:

int32_t isReadDataAvailableOnSocket ( int sock, uint32_t waitTimeUs )
{
  fd_set fds;
  int16_t ret = 0;
  struct timeval timeout;
  struct timeval* timeoutPtr = NULL;

  if (waitTimeUs>0) {
    timeout.tv_sec = waitTimeUs / 1000000;
    timeout.tv_usec = waitTimeUs % 1000000;
    timeoutPtr = &timeout;
  }

  FD_ZERO ( &fds );
  FD_SET ( sock, &fds );

  ret = select ( sock+1, &fds, NULL, NULL, timeoutPtr );
  if (ret == -1) {
    WARN("select failed for udp socket=[%d]", sock);
    return E_NETWORK_ERROR;
  }
  if ( ! FD_ISSET(sock, &fds) )
  {
    return E_NO_DATA;
  }
  else
  {
    return 0;
  }
}

Upvotes: 2

Ben
Ben

Reputation: 35613

The source of the problem is if there is no data to read the read will block. E.g. if there are fewer than the expected n bytes written. This is known as a blocking read.

To discover whether there is data, use select as Jite says.

Finally you may have a firewall dropping a live connection. Some firewalls are configured to cut connections which have been open for longer than a given time, e.g. 30 minutes. Probably this is not what you have however.

Upvotes: 4

Related Questions