Can an accept socket have a transient failure that's worth retrying?

Question

This questions is primarily for boost::asio, but those on the socket tag will probably have some insight into transient failures regarding the accept call.

In Boost::Asio, if I have a socket acceptor coded to continuously accept new connections.

void Acceptor::StartNextAccept()
{
    // _acceptor is of type boost::asio::ip::tcp::acceptor

    _acceptor->async_accept([this](const boost::system::error_code& ec, boost::asio::ip::tcp::socket sock) {
        if (ec)
        {
            // error
            LogErrorCode(ec);
        }
        else
        {
            // success
            HandleNewConnection(s);
        }

        StartNextAccept(); // enqueue another accept call regardless of success or error case

    });
}

My concern is that if the acceptor socket gets into an error state, the above code will be in an infinite loop of continuously logging the failure, enqueuing a new attempt, ad infinitum. Thus, burning up a core and filling up a log file needlessly.

Which is the better assumption:

async_accept calls should never fail on valid sockets. Don't worry about the above code since you diligently checked for errors in initializing the socket and tested your code.
async_accept calls can fail, but it never makes sense to retry them, so just close this socket and get out of the retry loop.
async_accept calls can have transient failures. Check the error code to determine if it's worth retrying.

If #3 above is the correct assumption, what are the recommended error codes to check for? And if the error is transient (such as low machine resources, out of handles, etc...) does it make sense to wait a few seconds before retrying so that the thread doesn't burn a core?

Update: For what it's worth. My primary platforms are Mac and Windows 10.

sehe · Accepted Answer

Can network layers have transient problems that are worth retrying? Yes.

However, linux accept errors are returned from the pending connection list (backlog), whereas e.g. BSD reports them directly.

  Error handling
       Linux accept() (and accept4()) passes already-pending network errors
       on the new socket as an error code from accept().  This behavior
       differs from other BSD socket implementations.  For reliable
       operation the application should detect the network errors defined
       for the protocol after accept() and treat them like EAGAIN by
       retrying.  In the case of TCP/IP, these are ENETDOWN, EPROTO,
       ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH, EOPNOTSUPP, and
       ENETUNREACH.

Other conditions that don't apply with Asio's async_connect are e.g. EWOULDBLOCK/EAGAIN, EFAULT.

See boost::asio::error for the corresponding error_code names: https://www.boost.org/doc/libs/master/boost/asio/error.hpp

Otherwise go through the list of system errors documented and see which you think are worth handling explicitly.

In my code I usually just end the chain:

_acceptor->async_accept([this](const boost::system::error_code& ec, boost::asio::ip::tcp::socket sock) {
    if (ec) {
        LogErrorCode(ec);
    } else {
        HandleNewConnection(s);
        StartNextAccept();
    }
});

To which my server would just reinitialize the listener (acceptor in Asio speak). Of course, that itself might fail, to which the server would probably shutdown.

You may or may not have QoS requirements that prompt you to handle individual conditions differently.

Ultimately, re-initializing the acceptor might be more robust, e.g. when network configuration changed?

Can an accept socket have a transient failure that's worth retrying?

Answers (1)

Related Questions

Can an accept socket have a transient failure that&#39;s worth retrying?

Answers (1)

Related Questions

Can an accept socket have a transient failure that's worth retrying?