Efficient structure for Multiple Thread access

I need to implement a mechanism which has a datastruture (Queue at this moment) that contains a list of pending request objects which are marked by different threads when being used and taken off when a thread is finished using it.

This datastructure could have up to a few thousand items in it at any given time and N threads will be taking requests from it (essentially marking it as 'taken') then when the thread is finished it will find that same request in the structure and remove it.

Now I was wondering how effective a C++ STL Queue would be in terms of doing this, and in terms of having to look for the same item again when needing to remove it from the queue?

I don't want this datastructure to be locked by a Thread Synchronization mechanisms for too long because a thread is looking for an items somewhere within it. This could lock up my entire program. (The program needs to be very high performance and fast)

Can anyone advice on how to best implement this in a multi-threading environment so that the structure doesn't get locked up for long times when a search needs to be performed?

Upvotes: 2

Answers (3)

StackedCrooked

Reputation: 35485

Quoting Herb Sutter:

Linked lists are wonderfully concurrency-friendly data structures because they support highly localized updates. In particular, as illustrated in Figure 1, to insert a new node into a doubly linked list, you only need to touch two existing nodes; namely, the ones immediately adjacent to the position the new node will occupy to splice the new node into the list. To erase a node, you only need to touch three nodes: the one that is being erased, and its two immediately adjacent nodes.

That aside, I agree with the comments that you should probably remove the item from the queue before processing it. But I may be wrong as I don't know the finer details of your application.

Upvotes: 1

Steve Townsend

Reputation: 54168

You may be focusing on what is not the hardest part of your design here.

If the queue is FIFO without any prioritization then your accessors are going to be push_back() and pop_front() - very fast even if you don't go to the trouble of using compare-and-swap (CAS) semantics but stick with a simple mutex/critical section. If you need the ability to prioritize traffic then things get harder. If you do use CAS locking then (on Windows anyway) you cannot improve on boost::thread's shared_mutex without spending way too much time doing this part of your coding. Not sure about the non-Windows implementations.

The more complex part of this problem is typically signalling idle worker threads to pick up new work. You can't have them looping until queue.front() is non-empty so you need a way to ensure the correct number of idle threads get kicked to pick up queued items. When a worker thread goes idle it can check for new work and execute if so, if not then the queue state needs to be set to idle so that the next push_back results in a "wake up" kick to restart the worker thread pool. This area has to be 100% robust to all non-fatal exceptions or your process will go dark.

Are you managing your own threads or using a built-in thread pool? Are you planning to have a dynamically-sized thread pool or just spawn N threads (configurable presumably) and have them run until process exit?

Definitely have the worker threads do the logging of work item process. Understanding who owns a workitem at any part of its lifecycle is vital. Stop/start work, and a workitem summary and timing is going to be useful. if logging is slow then push this off to a separate thread via a fire-and-forget queue but then you have to look out for latency there making your log less useful. If you do need the ability to externally manipulate in-progress workitems then a separate structure from your queue of pending work - in-progress work items indexed by thread and showing current status/start time, with separate locking, sounds like a good idea. This structure is going to be O(thread count) so smaller than the "pending" queue so scanning it is not likely to be a bottleneck provided long-running resultant ops are done outside the structure lock.

Regarding performance - what are your worker threads going to be doing? if work items are going to be long-running, do a lot of I/O or other expensive operations, then the queue interaction is not your performance bottleneck anyway so over-optimizing that area is relatively unproductive. Consider perf of the entire system in your design, not just one small area.

This is just for starters. Good luck, this is not an easy system to design robustly.

[EDIT] based on workitem description.

Parsing should be quick (although may involve costly source data retrieval - hard to say?), DB access less so. Sounds like tuning the DB may be your biggest bang per buck perf-wise. If you don't have control of this then you just have to mitigate slow DB in your design as much as possible. If you have the option to do async DB access then the worker thread could just do enough work to kick off the DB call and then complete the work on a callback, allowing other work to get kicked off on the worker thread. Without async DB access, reliable request timeout will be hard to implement without some alternative method of indirection where your main worker thread does not wait for the DB calls to complete inline. You need to decouple your main worker threads from reliance on the DB unless you can trust the DB to return or error out in a timely way. Maybe some configurable or workitem-specific timeout on the DB request? Often the DB API libraries allow this.

Your timeout monitor would need to stay aware of the workitem state. Possibly some virtual Cancel() method on your workitem, to ensure flexibility in cleanup of timed-out items.

Upvotes: 1

graham.reeds

Reputation: 16476

Take a long look at Herb Sutters "Effective Concurrency" series (soon to be a book).

Always remove items from the queue before consuming - you don't eat apples while still on the tree do you?

In short: When removing stuff from the queue/singly-linked-list use a compare-and-swap atomic operation or in Windows parlance InterlockedExchangePointer. This will always allow one thread to be moving forward. There are probably similar functions in Boost.

Also move the logging into the class doing the consuming.

Upvotes: 0

Efficient structure for Multiple Thread access

Answers (3)

Related Questions