pmf
pmf

Reputation: 7749

Thread safety of curl_multi_remove_handle

It seems like some sources recommend using curl_multi_remove_handle to "invalidate" a curl handle and cause curl_multi_wait to return early. This seems not to be covered under the thread safety guarantee (if done from another thread), or am I wrong (the threads safety guarantees are basically just reentrancy guarantees)?

What is the recommended way signal curl_multi_wait to return early? Is it really required to do it via timeouts? (Under Linux, I would use an eventfd in the epoll set to effectively have the case "wait on these sockets OR this event fd OR the given timeout".) It seems I could use custom curl_waitfd structures, but this would require platform specific setup for dummy sockets.

Upvotes: 1

Views: 1681

Answers (2)

Daniel Stenberg
Daniel Stenberg

Reputation: 58114

You must not call curl_multi_remove_handle from thread B if curl_multi_wait for that handle is running in thread A. That will just cause tears and misery.

You can opt to, for example:

  • user sufficiently short timeouts for curl_multi_wait() so that you don't need to abort it
  • add a private socket/file descriptor to send data on to abort when you want to
  • return error from the progress callback (or another callback) for the transfer(s) you need to stop - by setting a flag that they all check (global, or global like)
  • rework your app logic so that you can consider the transfer to "dead" without it having stopped yet, and have libcurl have its cause and close it later and you don't have to care much about it being done a bit after you decided you can ignore it.

curl_multi_poll()

After I first wrote this answer, we introduced curl_multi_poll in libcurl. This function is very similar to curl_multi_wait but also allows it to pre-emptively return with the use of curl_multi_wakeup, thus offering applications a few more alternative approaches.

Upvotes: 3

Kai Petzke
Kai Petzke

Reputation: 2984

Unfortunately, curl_multi is not, what people these days would deem as "thread safe". Yes, you can use a CURLM handle in two different threads, as long, as they don't access it at the same time. But hey, this is true for almost any data structure in C or C++.

So, if you have one thread running an event loop with curl_multi_wait(), you cannot use a second thread to add new jobs via curl_multi_add_handle() or remove jobs via curl_multi_remove_handle(). Well, it will work most of the times, but especially during high load, you will start getting data corruptions and segfaults due to the concurrent access to libcurl's internal data structures.

There are two ways around this problem, but both require a bit of coding:

  • Use the newer curl_multi_poll() interface, which (unlike curl_multi_wait()) is externally interruptible via curl_multi_wakeup(). Yes, curl_multi_wakeup() is the ONLY function on CURLM handles, that is safe to call concurrently from another thread (or even multiple threads). To add new requests to the event loop or remove requests from it, you would need some request queue and a mutex, which secures access to that queue. Then, to add a new job, you would do:

    • (thread 1 is running curl_multi_poll() in an endless loop)
    • thread 2 acquires said mutex
    • thread 2 posts an "add easy handle request" into the request queue
    • thread 2 releases said mutex again
    • thread 2 calls curl_multi_wakeup()
    • thread 1 acquires the mutex after curl_multi_poll() returns
    • thread 1 then processes the "add easy handle request" in the job list and performs curl_multi_add_handle()
    • thread 1 then releases the mutex again
    • thread 1 does all other necessary work (in particular call curl_multi_perform() and pass finished transfers to the application etc.)
    • thread 1 calls curl_multi_poll() again

    To remove a job, you would use the same procedure, just let thread 2 post an "remove easy handle request" instead of an "add easy handle request" to the request queue and then let thread 1 call curl_multi_remove_handle() instead of curl_multi_add_handle().

    In this solution, ALL calls to the CURLM handle are performed from thread 1, with the sole exception of curl_multi_wakeup(), which is used by other threads to signal thread 1 of new work waiting in the request queue.

  • Or use the curl_action() interface, where you have to provide two callbacks to libcurl, with which it reports file descriptors to watch and a timeout to your application. You then have to call epoll() or a similiar OS function yourself to wait for activity (or timeout) in the event loop thread. Then add a mutex again to serialize access to the CURLM handle: Your event loop thread should lock that mutex just before it calls curl_action() (or any other function on the CURLM handle) and unlock it immediately after. As curl_action() (unlike curl_multi_poll()) does not sleep, that mutex will be locked only for brief intervals. So other threads can then easily directly lock that mutex for themselves, too, and call curl_multi_add_handle() or curl_multi_remove_handle() as needed. Be aware, though, that those intervening additions or removals of handles can modify the active FD set, and that you may need some synchronisation with the event loop thread to notify it of the modified epoll() set.

The first solution is likely easier to implement. You should be able to find libcurl wrappers for both variants on Github, but be sure to test them intensively before using them in any critical application.

Upvotes: 2

Related Questions