Reputation: 104514
This questions is primarily for boost::asio, but those on the socket
tag will probably have some insight into transient failures regarding the accept
call.
In Boost::Asio, if I have a socket acceptor coded to continuously accept new connections.
void Acceptor::StartNextAccept()
{
// _acceptor is of type boost::asio::ip::tcp::acceptor
_acceptor->async_accept([this](const boost::system::error_code& ec, boost::asio::ip::tcp::socket sock) {
if (ec)
{
// error
LogErrorCode(ec);
}
else
{
// success
HandleNewConnection(s);
}
StartNextAccept(); // enqueue another accept call regardless of success or error case
});
}
My concern is that if the acceptor socket gets into an error state, the above code will be in an infinite loop of continuously logging the failure, enqueuing a new attempt, ad infinitum. Thus, burning up a core and filling up a log file needlessly.
Which is the better assumption:
async_accept calls should never fail on valid sockets. Don't worry about the above code since you diligently checked for errors in initializing the socket and tested your code.
async_accept calls can fail, but it never makes sense to retry them, so just close this socket and get out of the retry loop.
async_accept calls can have transient failures. Check the error code to determine if it's worth retrying.
If #3 above is the correct assumption, what are the recommended error codes to check for? And if the error is transient (such as low machine resources, out of handles, etc...) does it make sense to wait a few seconds before retrying so that the thread doesn't burn a core?
Update: For what it's worth. My primary platforms are Mac and Windows 10.
Upvotes: 1
Views: 438
Reputation: 392911
Can network layers have transient problems that are worth retrying? Yes.
However, linux accept
errors are returned from the pending connection list (backlog), whereas e.g. BSD reports them directly.
Error handling Linux accept() (and accept4()) passes already-pending network errors on the new socket as an error code from accept(). This behavior differs from other BSD socket implementations. For reliable operation the application should detect the network errors defined for the protocol after accept() and treat them like EAGAIN by retrying. In the case of TCP/IP, these are ENETDOWN, EPROTO, ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH, EOPNOTSUPP, and ENETUNREACH.
Other conditions that don't apply with Asio's async_connect
are e.g. EWOULDBLOCK
/EAGAIN
, EFAULT
.
See
boost::asio::error
for the correspondingerror_code
names: https://www.boost.org/doc/libs/master/boost/asio/error.hpp
Otherwise go through the list of system errors documented and see which you think are worth handling explicitly.
In my code I usually just end the chain:
_acceptor->async_accept([this](const boost::system::error_code& ec, boost::asio::ip::tcp::socket sock) {
if (ec) {
LogErrorCode(ec);
} else {
HandleNewConnection(s);
StartNextAccept();
}
});
To which my server would just reinitialize the listener (acceptor
in Asio speak). Of course, that itself might fail, to which the server would probably shutdown.
You may or may not have QoS requirements that prompt you to handle individual conditions differently.
Ultimately, re-initializing the acceptor might be more robust, e.g. when network configuration changed?
Upvotes: 2