Eumcoz
Eumcoz

Reputation: 2458

Updating cache without blocking

I currently have a program that has a cache like mechanism. I have a thread listening for updates from another server to this cache. This thread will update the cache when it receives an update. Here is some pseudo code:

void cache::update_cache()
{
    cache_ = new std::map<std::string, value>();
    while(true)
    {
        if(recv().compare("update") == 0)
        {
            std::map<std::string, value> *new_info = new std::map<std::string, value>();
            std::map<std::string, value> *tmp;
            //Get new info, store in new_info
            tmp = cache_;
            cache_ = new_cache;
            delete tmp;              
        }
    }
}

std::map<std::string, value> *cache::get_cache()
{
    return cache_;
}

cache_ is being read from many different threads concurrently. I believe how I have it here I will run into undefined behavior if one of my threads call get_cache(), then my cache updates, then the thread tries to access the stored cache.

I am looking for a way to avoid this problem. I know I could use a mutex, but I would rather not block reads from happening as they have to be as low latency as possible, but if need be, I can go that route.

I was wondering if this would be a good use case for a unique_ptr. Is my understanding correct in that if a thread calls get_cache, and that returns a unique_ptr instead of a standard pointer, once all threads that have the old version of cache are finished with it(i.e leave scope), the object will be deleted.

Is using a unique_ptr the best option for this case, or is there another option that I am not thinking of?

Any input will be greatly appreciated.

Edit:

I believe I made a mistake in my OP. I meant to use and pass a shared_ptr not a unique_ptr for cache_. And when all threads are finished with cache_ the shared_ptr should delete itself.

A little about my program: My program is a webserver that will be using this information to decide what information to return. It is fairly high throughput(thousands of req/sec) Each request queries the cache once, so telling my other threads when to update is no problem. I can tolerate slightly out of date information, and would prefer that over blocking all of my threads from executing if possible. The information in the cache is fairly large, and I would like to limit any copies on value because of this.

update_cache is only run once. It is run in a thread that just listens for an update command and runs the code.

Upvotes: 0

Views: 483

Answers (2)

Casey
Casey

Reputation: 42554

shared_ptr is very reasonable for this purpose, C++11 has a family of functions for handling shared_ptr atomically. If the data is immutable after creation, you won't even need any additional synchronization:

class cache {
public:
    using map_t = std::map<std::string, value>;
    void update_cache();
    std::shared_ptr<const map_t> get_cache() const;
private:
    std::shared_ptr<const map_t> cache_;
};

void cache::update_cache()
{
    while(true)
    {
        if(recv() == "update")
        {
            auto new_info = std::make_shared<map_t>();
            // Get new info, store in new_info
            // Make immutable & publish
            std::atomic_store(&cache_,
                              std::shared_ptr<const map_t>{std::move(new_info)});
        }
    }
}

auto cache::get_cache() const -> std::shared_ptr<const map_t> {
    return std::atomic_load(&cache_);
}

Upvotes: 1

Joky
Joky

Reputation: 1628

I feel there are multiple issues:

1) Do not leak memory: for that never use "delete" in your code and stick with unique_ptr (or shared_ptr in specific cases)

2) Protect accesses to shared data, for that either using locking (mutex) or lock-free mecanism (std::atomic)

class Cache {
    using Map = std::map<std::string, value>();
    std::unique_ptr<Map> m_cache;
    std::mutex m_cacheLock;
public:

    void update_cache()
    {
        while(true)
        {
            if(recv().compare("update") == 0)
            {
                std::unique_ptr<Map> new_info { new Map };
                //Get new info, store in new_info
                {
                   std::lock_guard<std::mutex> lock{m_cacheLock};
                   using std::swap;
                   swap(m_cache, new_cache);
                }
            }
        }
    }

Note: I don't like update_cache() being part of a public interface for the cache as it contains an infinite loop. I would probably externalize the loop with the recv and have a:

    void update_cache(std::unique_ptr<Map> new_info)
    {
        { // This inner brace is not useless, we don't need to keep the lock during deletion
           std::lock_guard<std::mutex> lock{m_cacheLock};
           using std::swap;
           swap(m_cache, new_cache);
        }
    }

Now for the reading to the cache, use proper encapsulation and don't leave the pointer to the member map escape:

    value get(const std::string &key)
    {
        // lock, fetch, and return. 
        // Depending on value type, you might want to allocate memory
        // before locking
    }

Using this signature you have to throw an exception if the value is not present in the cache, another option is to return something like a boost::optional.

Overall you can keep a low latency (everything is relative, I don't know your use case) if you take care of doing costly operations (memory allocation for instance) outside of the locking section.

Upvotes: 2

Related Questions