Thread-cached object referencing

Question

I need to create a sort of a shared object (for whatever reason). It's not limited to the single-threaded usage. Generally in such cases interlocked operations are the way to go (such as InterlockedIncrement and InterlockedDecrement on Win32).

Whereas the object reference counting should work correctly in any scenario, I'd like to optimize it for single-threaded usage. Interlocked operations are very much heavier than the standard arithmetic ones. From my measurements an interlocked operation (issuing full memory barrier) takes about 40 CPU cycles on my "typical" CPU, whereas standard arithmetic ones are below any measurement accuracy (thanks to CPU cache).

There's a similar technique when it comes to the memory allocation. There are heap implementations, such as "TCMalloc", that consist of a centralized memory partitioning mechanism, guarded by the appropriate synchronization objects, plus per-thread caching. In the most common scenario the memory allocated/freed on per-thread cache, which doesn't involve any interlocked operations at all, plus CPU cache is utilized with high probability.

Hence I though about a possibility to do something similar for reference-supporting objects. Any ideas how to achieve this? Raw ideas are also welcome.

It's ok in my scenario to delay the actual object destruction for some time, if this improves the performance.

Thread-cached object referencing

Answers (1)

Related Questions