Reputation: 33847
How should I use new
in a multithread environment?
Precisely: I have a piece of code that I run with 40 threads. Each thread invokes new
a few times. I noticed that performance drops, probably because threads lock in new
(significant time is spent in __lll_lock_wait_parallel
and __lll_unlock_wait_parallel
). What is the best alternative to new
/ delete
I can use?
Related:
Upvotes: 2
Views: 3635
Reputation: 1401
i think you should using memory pool . allocate all memory you need (if size is Fix) at the first time when your project started and let the arrays the memory that they need from the first array you allocated .
Upvotes: 2
Reputation: 1
Since nobody mentioned it, I might also suggest trying to use Boehm's conservative garbage collector; this means using new(gc)
instead of new
, GC_malloc
instead of malloc
and don't bother about free
-ing or delete
-ing memory objects. A couple of years ago, I measured GC_malloc
versus malloc
, it was a bit slower (perhaps 25µs for GC_malloc
versus 22µs for system malloc
).
I have no idea of the performance of Boehm's GC in multi-threaded usage (but I do know it can be used in multi-threaded applications).
Boehm's GC has the advantage that you should not care about free
-ing your data.
Upvotes: 0
Reputation: 2611
1st, do you really have to "new" that thing ? Why not use a local variable or a per-thread heap object.
2nd, have a look at http://en.wikipedia.org/wiki/Thread-local_storage if your development environment supports it...
Upvotes: 1
Reputation: 24877
I tend to use object pools in servers and other such apps that are characterized by continual and frequent allocation and release of large numbers of a few sets of objects, (in servers - socket, buffer and buffer-collection classes). The pools are queues, created at startup with an appropriate number of instances pushed on, (eg. my server - 24000 sockets, 48000 collections and an array of 7 pools of buffers of varying size/count). Popping an object instance off a queue and pushing it back on is much quicker than new/delete, even if the pool queue has a lock because it is shared across the threads, (the smaller the lock span, the smaller the chance of contention). My pooled-object class, (from which all the sockets etc. are inherited), has a private 'myPool' member, (loaded at startup), and a 'release()' method with no parameters & so any buffer is easily and correctly returned to its own pool. There are issues:
1) Ctor and dtor are not called upon allocate/release & so allocated objects contain all the gunge left over from their last use. This can occasionally be useful, (eg. re-useable socket objects), but generally means that care needs to be taken over, say, the initial state of booleans, value of int's etc.
2) A pool per thread has the greatest performance improvement potential - no locking required, but in systems where the loading on each thread is intermittent, ths can be an object waste. I never seem to be able to get away with this, mainly because I use pooled objects for inter-thread comms and so release () has to be thread-safe anyway.
3) Elimination of 'false sharing' on shared pools can be awkward - each instance should be initially 'newed' so as to exclusively use up an integer number of cache pages. At least this only has to be done once at startup.
4) If the system is to be resilient upon a pool running out, either more objects need to be allocated to add to the pool when needed, (the pool size is then creeping up), or a producer-consumer queue can be used so that threads block on the pool until objects are released, (P-C queues are slower because of the condvar/semaphore/whatever for waiting threads to block on, also threads that allocate before releasing can deadlock on an empty pool).
5) Monitoring of the pool levels during development is required so that object leakages and double-releases can be detected. Code/data can be added to the objects/pools to detect such errors as they happen but this compromises performance.
Upvotes: 1
Reputation: 75665
Even if you are using the new
operator, its using malloc
underneath to do the allocation and deallocation. The focus should be on the allocator and not the API used to reach it in these circumstances.
TCMalloc is a malloc created at Google specifically for good performance in a multi-threading environment. It is part of google-perf-tools.
Another malloc you might look at is Hoard. It has much the same aims as TCMalloc.
Upvotes: 6
Reputation: 182664
I don't know about "the best", but I would try a few things:
Reduce the frequency of allocations / frees (might be hard). Just waste memory (but don't leak) if it improves performance
Roll my own, per-thread allocator and always alloc / free from the same thread using mmap
for the real memory
To roll your own primitive allocator:
mmap
to obtain a large chunk of memory from the OSI don't consider this trivial to do but if done right it could improve performance. The hairiest part is by far keeping track of the allocations, preventing fragmentation etc.
A simple implementation is provided in "The C Programming Language", near the end of the book (but it uses brk
IIRC).
Upvotes: 5