codereviewanskquestions
codereviewanskquestions

Reputation: 13998

Refreshing cache without impacting latency to access the cache

I have a cache refresh logic and want to make sure that it's thread-safe and correct way to do it.

public class Test {

    Set<Integer> cache = Sets.newConcurrentHashSet();

    public boolean contain(int num) {
        return cache.contains(num);
    }

    public void refresh() {
        cache.clear();
        cache.addAll(getNums());
    }
}

So I have a background thread refreshing cache - periodically call refresh. And multiple threads are calling contain at the same time. I was trying to avoid having synchronized in the methods signature because refresh could take some time (imagine that getNum makes network calls and parsing huge data) then contain would be blocked.

I think this code is not good enough because if contain called in between clear and addAll then contain always returns false.

What is the best way to achieve cache refreshing without impacting significant latency to contain call?

Upvotes: 2

Views: 743

Answers (3)

Derrops
Derrops

Reputation: 8127

Best way would be to use functional programming paradigm whereby you have immutable state (in this case a Set), instead of adding and removing elements to that set you create an entirely new Set every time you want to add or remove elements. This is in Java9.

It can be a bit awkward or infeasible however to achieve this method for legacy code. So instead what you could do is have 2 Sets 1 which has the get method on it which is volatile, and then this is assigned a new instance in the refresh method.

public class Test {

    volatile Set<Integer> cache = new HashSet<>();

    public boolean contain(int num) {
        return cache.contains(num);
    }

    public void refresh() {
        Set<Integer> privateCache = new HashSet<>();
        privateCache.addAll(getNums());
        cache = privateCache;
    }
}

Edit We don't want or need a ConcurrentHashSet, that is if you want to add and remove elements to a collection at the same time, which in my opinion is a pretty useless thing to do. But you want to switch the old Set with a new one, which is why you just need a volatile variable to make sure you can't read and edit the cache at the same time.

But as I mentioned in my answer at the start is that if you never modify collections, but instead make new ones each time you want to update a collection (note that this is a very cheap operation as internally the old set is reused in the operation). This way you never need to worry about concurrency, as there is no shared state between threads.

Upvotes: 3

CCC
CCC

Reputation: 2761

How would you make sure your cache doesn't contain invalid entries when calling contains?? Furthermore, you'd need to call refresh every time getNums() changes, which is pretty inefficient. It would be best if you make sure you control your changes to getNums() and then update cache accordingly. The cache might look like:

public class MyCache {

    final ConcurrentHashMap<Integer, Boolean> cache = new ConcurrentHashMap<>(); //it's a ConcurrentHashMap to be able to use putIfAbsent 

    public boolean contains(Integer num) {
        return cache.contains(num);
    }

    public void add(Integer nums) {
      cache.putIfAbsent(num, true);
    }

    public clear(){
       cache.clear();
    }

    public remove(Integer num) {
       cache.remove(num);
    }

}

Upvotes: 0

payloc91
payloc91

Reputation: 3809

Update

As @schmosel made me realize, mine was a wasted effort: it is in fact enough to initialize a complete new HashSet<> with your values in the refresh method. Assuming of course that the cache is marked with volatile. In short @Snickers3192's answer, points out what you seek.


Old answer

You can also use a slightly different system.

Keep two Set<Integer>, one of which will always be empty. When you refresh the cache, you can asynchronously re-initialize the second one and then just switch the pointers. Other threads accessing the cache won't see any particular overhead in this.

From an external point of view, they will always be accessing the same cache.

private volatile int currentCache; // 0 or 1

private final Set<Integer> caches[] = new HashSet[2]; // use two caches; either one will always be empty, so not much memory consumed

private volatile Set<Integer> cachePointer = null; // just a pointer to the current cache, must be volatile

// initialize
{
    this.caches[0] = new HashSet<>(0);
    this.caches[1] = new HashSet<>(0);

    this.currentCache = 0;

    this.cachePointer = caches[this.currentCache]; // point to cache one from the beginning
}

Your refresh method may look like this:

public void refresh() {

    // store current cache pointer
    final int previousCache = this.currentCache;
    final int nextCache = getNextPointer();

    // you can easily compute it asynchronously
    // in the meantime, external threads will still access the normal cache
    CompletableFuture.runAsync( () -> {
        // fill the unused cache
        caches[nextCache].addAll(getNums());
        // then switch the pointer to the just-filled cache
        // from this point on, threads are accessing the new cache
        switchCachePointer();
        // empty the other cache still on the async thread
        caches[previousCache].clear();
    });
}

where the utility methods are:

public boolean contains(final int num) {
    return this.cachePointer.contains(num);
}

private int getNextPointer() {
    return ( this.currentCache + 1 ) % this.caches.length;
}

private void switchCachePointer() {
    // make cachePointer point to a new cache
    this.currentCache = this.getNextPointer();
    this.cachePointer = caches[this.currentCache];

}

Upvotes: -1

Related Questions