Multithreaded read-many, write-seldom array/vector iteration in C++

Question

I have a need to almost-constantly iterate over a sequence of structs in a read-only fashion but for every 1M+ reads, one of the threads may append an item. I think using a mutex would be overkill here and I also read somewhere that r/w locks have their own drawbacks for the readers.

I was thinking about using reserve() on a std::vector but this answer Iterate over STL container using indices safe way to avoid using locks? seemed to invalidate that.

Any ideas on what way might be fastest? The most important thing is for the readers to be able to quickly and efficiently iterate with as little contention as possible. The writing operations aren't time-sensitive.

Update: Another one of my use cases is that the "list" could contain pointers rather than structs. I.e, std::vector. Same requirements apply.

Update 2: Hypothetic example

globally accessible:

typedef std::vector Vector;
Vector v;
v.reserve(50);

Reader threads 1-10: (these run pretty much run all the time)

.
.
int total = 0;
for (Vector::const_iterator it = v.begin(); it != v.end(); ++it)
{
   MyClass* ptr = *it;
   total += ptr->getTotal();
}
// do something with total
.
.

Writer threads 11-15:

MyClass* ptr = new MyClass();
v.push_back(ptr);

That's basically what happens here. threads 1-15 could all be running concurrently although generally there are only 1-2 reading threads and 1-2 writer threads.

user2116939 · Accepted Answer

What I think could work here is own implementation of vector, something like this:

template  class Vector
{
// constructor will be needed of course
public:
    std::shared_ptr > getVector()
        { return mVector; }
    void push_back(const T&);

private:
    std::shared_ptr > mVector;
};

Then, whenever readers need to access a specific Vector, they should call getVector() and keep the returned shared_ptr until finished reading.

But writers should always use Vector's push_back to add new value. This push_back should then check if mVector.size() == mVector.capacity() and if true, allocate new vector and assign it to mVector. Something like:

template  Vector::push_back(const T& t)
{
    if (mVector->size() == mVector->capacity())
    {
        // make certain here that new_size > old_size
        std::vector vec = new std::vector (mVector->size() * SIZE_MULTIPLIER);

        std::copy(mVector->begin(), mVector->end(), vec->begin());
        mVector.reset(vec);
    }
// put 't' into 'mVector'. 'mVector' is guaranteed not to reallocate now.
}

The idea here is inspired by RCU (read-copy-update) algorithm. If storage space is exhausted, the new storage should not invalidate the old storage as long as there is at least one reader accessing it. But, the new storage should be allocated and any reader coming after allocation, should be able to see it. The old storage should be deallocated as soon as no one is using it anymore (all readers are finished).

Since most HW architectures provide some way to have atomic increments and decrements, I think shared_ptr (and thus Vector) will be able to run completely lock-less.

One disadvantage to this approach though, is that depending on how long readers hold that shared_ptr you might end up with several copies of your data.

PS: hope I haven't made too many embarrassing errors in the code :-)

Multithreaded read-many, write-seldom array/vector iteration in C++

Answers (2)

Related Questions