xyf
xyf

Reputation: 714

select() returns 0 with an error Operation now in progress after connect()

So here I am setting up a TCP connection to the server and is invoked from an application in a loop and sometimes I end up seeing the following error

select() timed out after 4 seconds - Operation now in progress

which means select did return 0 which means it timed out in 5 seconds without observing any activity on the file descriptor.

My understanding is nonblocking mode is set after connect() in case it doesn't connect right away with getsockopt() indicating whether connect() call did establish but for some reason, select seems to be returning 0. Does it have to with delay being too small?

int InitializeSocket(int sockType, int protocol, long timeout)
{
    int socketFd = socket(AF_INET, sockType, protocol);
    if (socketFd < 0)
    {
        perror ("Failed to create a client socket of type %d", sockType);
        return -1;
    }
    
    if (timeout > 0)
    {
        struct timeval sockTimeout = {.tv_sec = timeout, .tv_usec = 0};

        // setting the receive timeout
        if (setsockopt(socketFd, SOL_SOCKET, SO_RCVTIMEO, &sockTimeout, sizeof(sockTimeout)) < 0) 
        {
            perror ("Failed to set the RX timeout");
            return -1;
        }

        // setting the send timeout
        if (setsockopt(socketFd, SOL_SOCKET, SO_SNDTIMEO, &sockTimeout, sizeof(sockTimeout)) < 0) 
        {
            perror ("Failed to set the TX timeout");
            return -1;
        }
    }
    return socketFd;
}

void OpenTcpConnection(int serverTimeout, int port, const char *ipAddr)
{
    struct sockaddr_in *address
    
    int socketFd = InitializeSocket(SOCK_STREAM, 0, serverTimeout);
    if (socketFd == -1)
    {
        return -1;
    }
    
    address->sin_family = AF_INET;
    address->sin_port = htons(port);
    address->sin_addr.s_addr = inet_addr(ipAddr);
    memset(address->sin_zero, '\0', sizeof(address->sin_zero));
    
    // get the existing file flags
    long arg = 0;
    if( (arg = fcntl(socketFd, F_GETFL, NULL)) < 0) 
    { 
        perror ("Failed to get file status flags"); 
        exit(0); 
    } 

    // set the socket to nonblocking mode
    arg |= O_NONBLOCK; 
    if( fcntl(socketFd, F_SETFL, arg) < 0) 
    { 
        perror ("Failed to set to nonblocking mode");
        return -1;
    } 
    
    // connect to the server
    int res = connect(socketFd, (struct sockaddr *) &address, sizeof(address));

    fd_set fdset;
    struct timeval tv;
    long selectTimeout = 4; // connect() timeout

    if (res < 0) 
    { 
        // the socket is nonblocking & the connection cannot be completed immediately
        if (errno == EINPROGRESS) 
        { 
            do 
            { 
                tv.tv_sec = selectTimeout; 
                tv.tv_usec = 0; 
                FD_ZERO(&fdset); 
                FD_SET(socketFd, &fdset); 
                res = select(socketFd+1, NULL, &fdset, NULL, &tv); 
                
                if (res < 0 && errno != EINTR) 
                { 
                    perror ("Failed to monitor socket FD %d", socketFd);
                    return -1;
                } 
                else if (res > 0) 
                { 
                    int so_error;
                    socklen_t len = sizeof so_error;
                    int valopt; 

                    // check whether connect() completed successfully
                    if (getsockopt(socketFd, SOL_SOCKET, SO_ERROR, (void*)(&valopt), &len) < 0) 
                    { 
                        perror ("Error in getsockopt"); 
                        return -1;
                    } 
               
                    if (valopt) 
                    { 
                        perror ("Error in delayed connection");
                        return -1;
                    }
                    break;
                } 
                else
                {
                    perror ("select() timed out after %ld seconds", selectTimeout); // ERROR HERE !!!
                    return -1;
                }
            } while(1);
        }
    }
}

Upvotes: 0

Views: 1547

Answers (2)

Luis Colorado
Luis Colorado

Reputation: 12708

My understanding is nonblocking mode is set after connect() in case it doesn't connect right away with getsockopt() indicating whether connect() call did establish but for some reason, select seems to be returning 0. Does it have to with delay being too small?

nonblocking had to be set after the socket(2) call, and before the connect(2) call, or the connect would be blocked (not reaching the select() call) until the connect(2) fails. This is normally over two minutes, and this trick is done to wait only 5s. in the connect call.

A 5 sec delay is normally small for a remote connection of a remote site. In a lan, if you don't get connected in 5s. then it means something is wrong.

My bet is that something is wrong, you are trying to connect to a socket that is not available (non-existent host, check that the server is listening in the address:port you are trying to connect to), you have forgot to convert into net byte order some fields in the sockaddr_in structure (this appears to be correct in your snippet) or a firewall is blocking you from connecting (this can be the thing). You are waiting for the socketFd to be available for writing, which is correct, as it wouldn't be (and block) if the connection is not connected first, so apparently you are doing things correctly, so some address has been mispelled or a firewall is cutting the access to the server.

Either way, a timeout in select is not an error, but just a timeout. The software your are using is considering a timeout of 5s. fatal in a socket, so you have to ask the developer or check your network connection.

Upvotes: 0

Jeremy Friesner
Jeremy Friesner

Reputation: 73294

Per the select() man page:

select() returns the number of ready descriptors that are contained in
the descriptor sets, or -1 if an error occurred.  If the time limit
expires, select() returns 0.

... so if select() is returning 0, it's because no I/O operations were completed before your timeout was reached.

As for why no I/O operations were completed: if you were waiting for a TCP connection to complete, then the most likely explanation is that the TCP connection hadn't completed yet (perhaps because of a slow, overloaded, or broken network or server?).

Another (less likely, but possible) explanation might be that you are running your program under Windows, and under Windows, if a non-blocking connect() fails, that failure is indicated by setting a bit in the exceptions fd_set (i.e. the one that you would pass in as the fourth argument to select(), just before the timeout-argument). In the posted code you are passing in NULL for that argument, which means that under Windows you would have no way of knowing when your non-blocking TCP connection attempt has failed. (under other OS's, a failed connection would cause the socket to select as ready-for-read and ready-for-write also, making a connection-failure easier to react to)

Upvotes: 1

Related Questions