boost::asio::ip::tcp::socket.read_some() stops working. No exception or errors detected

Question

I am currently debugging a server(win32/64) that utilizes Boost:asio 1.78.

The code is a blend of legacy, older legacy and some newer code. None of this code is mine. I can't answer for why something is done in a certain way. I'm just trying to understand why this is happening and hopefully fix it wo. rewriting it from scratch. This code has been running for years on 50+ servers with no errors. Just these 2 servers that missbehaves.

I have one client (dot.net) that is connected to two servers. Client is sending the same data to the 2 servers. The servers run the same code, as follows in code sect.

All is working well but now and then communications halts. No errors or exceptions on either end. It just halts. Never on both servers at the same time. This happens very seldom. Like every 3 months or less. I have no way of reproducing it in a debugger bc I don't know where to look for this behavior.

On the client side the socket appears to be working/open but does not accept new data. No errors is detected in the socket.

Here's a shortened code describing the functions. I want to stress that I can't detect any errors or exceptions during these failures. Code just stops at "m_socket->read_some()".

Only solution to "unblock" right now is to close the socket manually and restart the acceptor. When I manually close the socket the read_some method returns with error code so I know it is inside there it stops.

Questions:

What may go wrong here and give this behavior?
What parameters should I log to enable me to determine what is happening, and from where.

main code:

std::shared_ptr io_service_is = std::make_shared();
auto is_work = std::make_shared(*io_service_is.get());

auto acceptor = std::make_shared(*io_service_is.get(), port);
acceptor->start();

auto threadhandle = std::thread([&io_service_is]() {io_service_is->run();});

TcpAcceptorWrapper:

void start(){
    m_asio_tcp_acceptor.open(boost::asio::ip::tcp::v4());
    m_asio_tcp_acceptor.bind(boost::asio::ip::tcp::endpoint(boost::asio::ip::tcp::v4(), m_port));
    m_asio_tcp_acceptor.listen();
    start_internal();
}
void start_internal(){
    m_asio_tcp_acceptor.async_accept(m_socket, [this](boost::system::error_code error) { /* Handler code */ });
}

Handler code:

m_current_session = std::make_shared(&m_socket);
std::condition_variable condition;
std::mutex mutex;
bool stopped(false);

m_current_session->run(condition, mutex, stopped);              
{
    std::unique_lock lock(mutex);
    condition.wait(lock, [&stopped] { return stopped; });
}

TcpSession runner:

void run(std::condition_variable& complete, std::mutex& mutex, bool& stopped){
    auto self(shared_from_this());
    
    std::thread([this, self, &complete, &mutex, &stopped]() {
        { // mutex scope

            // Lock and hold mutex from tcp_acceptor scope
            std::lock_guard lock(mutex);

            while (true) {
                std::array buffer;

                try {
                    boost::system::error_code error;

                    /* Next call just hangs/blocks but only rarely. like once every 3 months or more seldom */
                    std::size_t read = m_socket->read_some(boost::asio::buffer(buffer, M_BUFFER_SIZE), error);

                    if (error || read == -1) {
                        // This never happens
                        break;
                    }
                    // inside this all is working
                    process(buffer);

                } catch (std::exception& ex) {
                    // This never happens
                    break;
                } catch (...) {
                    // Neither does this
                    break;
                }
            }
            stopped = true;
        } // mutex released
        complete.notify_one();
    }).detach();
}

boost::asio::ip::tcp::socket.read_some() stops working. No exception or errors detected

Answers (1)

Related Questions