Reputation: 3630
I am compiling with Mingw64 on Windows the latest version of ASIO.
I have a sandbox code for accepting tcp connections. I use one context, a strand per acceptor and a socket and 2 threads (I have read in the documentation that posting into two different strands does not guarantee concurrent invocation). For some reason I get a deadlock at the end of execution and I don't know why it happens. It does not happen if:
std::future
synchronization and a request to stop a serveracceptor.cancel()
to its executor explicitlyDeadlock also does not happen if I close
the acceptor.
I have failed to find any relevant info in the documentation which might explain the reason for such behavior. And I don't want to ignore it since it might result in unpredictable problems.
Here is my sandbox code:
#include <asio.hpp>
#include <iostream>
#include <sstream>
#include <functional>
constexpr const char localhost[] = "127.0.0.1";
constexpr unsigned short port = 12000;
void runContext(asio::io_context &io_context)
{
std::string threadId{};
std::stringstream ss;
ss << std::this_thread::get_id();
ss >> threadId;
std::cout << std::string("New thread for asio context: ")
+ threadId + "\n";
std::cout.flush();
io_context.run();
std::cout << std::string("Stopping thread: ")
+ threadId + "\n";
std::cout.flush();
};
class server
{
public:
template<typename Executor>
explicit server(Executor &executor)
: acceptor_(executor)
{
using asio::ip::tcp;
auto endpoint = tcp::endpoint(asio::ip::make_address_v4(localhost),
port);
acceptor_.open(endpoint.protocol());
acceptor_.set_option(tcp::acceptor::reuse_address(true));
acceptor_.bind(endpoint);
acceptor_.listen();
}
void startAccepting()
{
acceptor_.async_accept(
[this](const asio::error_code &errorCode,
asio::ip::tcp::socket peer)
{
if (!errorCode)
{
startAccepting();
// std::cout << "Connection accepted\n";
}
if (errorCode == asio::error::operation_aborted)
{
// std::cout << "Stopped accepting connections\n";
return;
}
});
}
void startRejecting()
{
// TODO: how to reject?
}
void stop()
{
// asio::post(acceptor_.get_executor(), [this](){acceptor_.cancel();}); // this line fixes deadlock
acceptor_.cancel();
// acceptor_.close(); // this line also fixes deadlock
}
private:
asio::ip::tcp::acceptor acceptor_;
};
int main()
{
setvbuf(stdout, NULL, _IONBF, 0);
asio::io_context context;
// run server
auto serverStrand = asio::make_strand(context);
server server{serverStrand};
server.startAccepting();
// run client
auto clientStrand = asio::make_strand(context);
asio::ip::tcp::socket socket{clientStrand};
size_t attempts = 1;
auto endpoint = asio::ip::tcp::endpoint(
asio::ip::make_address_v4(localhost), port);
std::future<void> res = socket.async_connect(endpoint, asio::use_future);
std::future<void> runningContexts[] = {
std::async(std::launch::async, runContext, std::ref(context)),
std::async(std::launch::async, runContext, std::ref(context))
};
res.get();
server.stop();
std::cout << "Server has been requested to stop" << std::endl;
return 0;
}
UPDATE
According to the sehe's answer I am getting a deadlock, because when server.stop()
is invoked, completion handler for successful acception has been already posted but due to cancellation is never invoked, which causes a context to have pending work, hence a deadlock at the end (if I understood correctly).
The things I still don't understand are:
acceptor::cancel()
method in documentation, though distinct acceptor
objects are safe. So I assumed that it is thread safe (no data races possible within one strand
).
@sehe's code does not cause deadlock in case the cancel
is explicitly posted into the acceptor
's thread via asio::post
. For 500 invokation there were no deadlocks:test 499
Awaiting client
New thread 3
New thread 2
Completed client
Server stopped
Accept: 127.0.0.1:14475
Accept: The I/O operation has been aborted because of either a thread exit or an application request.
Stopping thread: 2
Stopping thread: 3
Everyting shutdown
However, if I remove printing code before synchronization and stop()
which causes a delay, a dead lock is easily achievable:
PS C:\dev\builds\asio_connection_logic\Release-MinGW-w64\bin> for ($i = 0; $i -lt 500; $i++){
>> Write-Output "
>> test $i"
>> .\sb.sf_example.exe}
test 0
New thread 2
New thread 3
Server stopped
Accept: 127.0.0.1:15160
PS C:\dev\builds\asio_connection_logic\Release-MinGW-w64\bin>
PS C:\dev\builds\asio_connection_logic\Release-MinGW-w64\bin> for ($i = 0; $i -lt 500; $i++){
>> Write-Output "
>> test $i"
>> .\sb.sf_example.exe}
test 0
New thread 2New thread 3
Server stopped
Accept: 127.0.0.1:15174
PS C:\dev\builds\asio_connection_logic\Release-MinGW-w64\bin> ^C
So, the conclusion is that no matter how you invoke acceptor.cancel()
, you will get a deadlock.
Upvotes: 1
Views: 1037
Reputation: 392833
I did some liposuction on the code and added some tracing:
#include <boost/asio.hpp>
#include <iostream>
#include <functional>
constexpr unsigned short port = 12000;
namespace asio = boost::asio;
using boost::system::error_code;
using asio::ip::tcp;
void runContext(asio::io_context& io_context) {
std::cout << "New thread " << std::this_thread::get_id() << std::endl;
io_context.run();
std::cout << "Stopping thread: " << std::this_thread::get_id() << std::endl;
}
class server {
public:
template <typename Executor>
explicit server(Executor executor) : acceptor_(executor, {{}, port}) {
acceptor_.set_option(tcp::acceptor::reuse_address(true));
}
void startAccepting() {
acceptor_.listen();
acceptLoop();
}
void stop()
{
//asio::post(acceptor_.get_executor(), [this]() {
//acceptor_.cancel();
//}); // this line fixes deadlock
acceptor_.cancel();
// acceptor_.close(); // this line also fixes deadlock
}
private:
void acceptLoop() {
acceptor_.async_accept([this](error_code errorCode, tcp::socket peer) {
if (!errorCode) {
std::cout << "Accept: " << peer.remote_endpoint() << std::endl;
acceptLoop();
} else {
std::cout << "Accept: " << errorCode.message() << std::endl;
}
});
}
tcp::acceptor acceptor_;
};
int main() {
setvbuf(stdout, NULL, _IONBF, 0);
asio::io_context context;
// run server
server server{make_strand(context)};
server.startAccepting();
// run client
tcp::socket socket{make_strand(context)};
std::future<void> res = socket.async_connect({ {}, port}, asio::use_future);
std::thread t1(runContext, std::ref(context));
std::thread t2(runContext, std::ref(context));
std::cout << "Awaiting client " << std::endl;
res.get();
std::cout << "Completed client" << std::endl;
server.stop();
std::cout << "Server stopped" << std::endl;
t1.join();
t2.join();
std::cout << "Everyting shutdown" << std::endl;
}
As you can see a "correct" run prints:
Awaiting client
New thread 140712013797120
Accept: New thread 140712005404416
127.0.0.1:57500
Completed client
Server stopped
Accept: Operation canceled
Stopping thread: 140712013797120
Stopping thread: 140712005404416
Everyting shutdown
However, an "incorrect run" prints:
New thread 140544269350656
Awaiting client
New thread 140544260957952
Completed client
Server stopped
Accept: 127.0.0.1:48580
^C
The key is here:
Server stopped
Accept: 127.0.0.1:48580
The cancel comes before the accept. Which means that there was a race, where the completion handler for the async_accept
was already in flight with non-failed errorCode
. (In other words async_connect
returned a bit earlier than the server was able to handle its async_accept
completion.)
Indeed, posting on the strand is one way to fix it. This is because any handlers in flight will run before the cancellation, and if async-operation is pending it will be canceled.
NOTE: The other approach with acceptor_.close()
invokes undefined behavior because of the data race on acceptor_
itself (which is not thread-safe).
Aside: a pet peeve:
One "problem" is std::launch::async
. I don't use it. I think the behaviour of it is implementation-defined to an extent that makes it not very useful. Perhaps, use std::thread
instead, since that's what you're after, here. On recent boost, use asio::thread_pool(2)
.
The answers here shed some light Why should I use std::async?
You're right it has the same race condition - although without the UB, so that's nice.
Sidenote: You should probably stop calling it "deadlock" because there is none. It's a softlock, you're just waiting for something that never happens (
async_connect
). Deadlock is when multiple parties contend for locks in a way that can never be satisfied. This is just a soft-lock in the sense that a connection or even a network failure will allow the system to proceed.
So I also stripped the output but added BOOST_ASIO_ENABLE_HANDLER_TRACKING. The resulting picture confirms the exact explanation above:
From here the only obvious solutions would seem to be:
don't cancel the server from the client strand. By extension it would mean
post a close
instead of a cancel
, this actively makes any async_accept
an error
alternatively, have a state machine in the server that manually checks whether we should still be accepting before initiating a new async_accept
Note that
with the .close()
method in with code that uses the native handle it can lead to its own race condition (where another thread immediately opens a new file/socket with the reused filedescriptor and the native code doesn't notice it's talking to the wrong socket). To be honest this mostly seems a problem with stream sockets (not acceptors) and there it's very easy to fix with .shutdown()
, so just to be aware.
I've never reached this problem in a lot of production use of ASIO. I guess, in practice your exact usecase (.cancel()
perfectly timed together with a new accept
completion) doesn't come up a lot
A similar use case that does come up quite a lot is with timers, which can also be tricky to handle race-free. There, the disambiguating factor is also extra state, in the form of basic_waitable_timer::expiry()
. See e.g. Cancelling boost asio deadline timer safely
Upvotes: 3