Dan Dinu
Dan Dinu

Reputation: 33398

Performance of ConcurrentBag, many reads, rare modifications

I'm trying to build a model where there will me multiple reads of an entire collection and rare additions and modifications to it.

I thought I might use the ConcurrentBag in .NET as I've read the documentation and it's supposed to be good for concurrent reads and writes.

The code would look like this:

public class Cache 
{     
   ConcurrentBag<string> cache = new ConcurrentBag<string>();

   // this method gets called frequently
   public IEnumerable<string> GetAllEntries()
   { 
         return cache.ToList();
   }

   // this method gets rarely called 
   public void Add(string newEntry) 
   {  
         // add to concurrentBag
   }

   public void Remove(string entryToRemove)
   {
        // remove from concurrent bag
   } 
} 

However, I've decompiled the ConcurrentBag class and on theGetEnumerator there's always a lock taken, which means any call to GetAllEntries will lock the entire collection and it will not perform.

I'm thinking to get around this and code it in this manner instead, using a list.

 public class Cache 
 {     
    private object guard = new object();

    IList<string> cache = new List<string>();

   // this method gets called frequently
   public IEnumerable<string> GetAllEntries()
   { 
     var currentCache = cache; 
     return currentCache;
   }

   // this method gets rarely called 
   public void Add(string newEntry) 
   {  
       lock (guard) 
       {
            cache.Add(newEntry); 
       }
   }

   public void Remove(string entryToRemove)
   {
      lock (guard) 
      {
           cache.Remove(entryToRemove);
      }
   } 
} 

Since the Add and Remove are rarely called I don't care too much about locking the access to the list there. On Get I might get a stale version of the list, but again I don't care, it will be fine for the next request.

Is the second implementation a good way to go?

EDIT

I've run a quick performance test and the results are the following:

Setup: populated the in memory collection with 10000 strings.

Action: GetAllEntries concurrently 50000 times.

Result: 00:00:35.2393871 to finish operation using ConcurrentBag (first implementation) 00:00:00.0036959 to finish operation using normal list (second implementation)

Code below:

 class Program
 {
    static void Main(string[] args)
    {
        // warmup caches and stopwatch
        var cacheWitBag = new CacheWithBag();
        var cacheWitList = new CacheWithList();

        cacheWitBag.Add("abc");
        cacheWitBag.GetAllEntries();

        cacheWitList.Add("abc");
        cacheWitList.GetAllEntries();


        var sw = new Stopwatch();
        // warmup stowtach as well
        sw.Start();

        // initialize caches (rare writes so no real reason to measure here
        for (int i =0; i < 50000; i++)
        {
            cacheWitBag.Add(new Guid().ToString());
            cacheWitList.Add(new Guid().ToString());
        }
        sw.Stop();

        // measure
        var program = new Program();

        sw.Start();
        program.Run(cacheWitBag).Wait();
        sw.Stop();
        Console.WriteLine(sw.Elapsed);

        sw.Restart();
        program.Run2(cacheWitList).Wait();
        sw.Stop();
        Console.WriteLine(sw.Elapsed);
    }

    public async Task Run(CacheWithBag cache1)
    {
        List<Task> tasks = new List<Task>();
        for (int i = 0; i < 10000; i++)
        {
            tasks.Add(Task.Run(() => cache1.GetAllEntries()));
        }

        await Task.WhenAll(tasks);
    }
    public async Task Run2(CacheWithList cache)
    {
        List<Task> tasks = new List<Task>();
        for (int i = 0; i < 10000; i++)
        {
            tasks.Add(Task.Run(() => cache.GetAllEntries()));
        }

        await Task.WhenAll(tasks);
    }

    public class CacheWithBag
    {
        ConcurrentBag<string> cache = new ConcurrentBag<string>();

        // this method gets called frequently
        public IEnumerable<string> GetAllEntries()
        {
            return cache.ToList();
        }

        // this method gets rarely called 
        public void Add(string newEntry)
        {
            cache.Add(newEntry);
        }
    }


    public class CacheWithList
    {
        private object guard = new object();

        IList<string> cache = new List<string>();

        // this method gets called frequently
        public IEnumerable<string> GetAllEntries()
        {
            var currentCache = cache;
            return currentCache;
        }

        // this method gets rarely called 
        public void Add(string newEntry)
        {
            lock (guard)
            {
                cache.Add(newEntry);
            }
        }

        public void Remove(string entryToRemove)
        {
            lock (guard)
            {
                cache.Remove(entryToRemove);
            }
        }
    }

}

}

Upvotes: 1

Views: 1270

Answers (3)

Cory Nelson
Cory Nelson

Reputation: 30001

To improve on InBetween's solution:

class Cache 
{
   ImmutableHashSet<string> cache = ImmutableHashSet.Create<string>();

   public IEnumerable<string> GetAllEntries()
   {   
       return cache;
   }

   public void Add(string newEntry) 
   {
       ImmutableInterlocked.Update(ref cache, (set,item) => set.Add(item), newEntry);
   }

   public void Remove(string entryToRemove)
   {
       ImmutableInterlocked.Update(ref cache, (set,item) => set.Remove(item), newEntry);
   }
}

This performs only atomic operations (no locking) and uses the .NET Immutable types.

Upvotes: 5

Antonio
Antonio

Reputation: 21

Instead of a 'lock' on a guard object to protect a simple container you should consider the 'ReaderWriterLockSlim' which is optimized and very performant for the read/write scenario : multiple readers are allowed at same time but only one writer is allowed and blocks other readers/writers. It is very useful in your scenario where you read a lot but write only few.

Please note you can be a reader and then, for some reason, decide to become a writer (upgrade the slim lock) in your "reading" code.

Upvotes: 1

InBetween
InBetween

Reputation: 32760

In your current scenario, where Add and Remove are rarely called, I'd consider the following approach:

public class Cache 
{     
    private object guard = new object();
    var cache = new SomeImmutableCollection<string>();

   // this method gets called frequently
   public IEnumerable<string> GetAllEntries()
   {   
       return cache;
   }

   // this method gets rarely called 
   public void Add(string newEntry) 
   {  
       lock (guard) 
       {
           cache = cache.Add(newEntry); 
       }
   }

   public void Remove(string entryToRemove)
   {
      lock (guard) 
      {
          cache = cache.Remove(entryToRemove);
      }
   } 
}

The fundamental change here is that cache now is an immutable collection, which means it can't change....ever. So concurrency problems with the collection itself simply disappear, something that can't change is inherently thread safe.

Also, depending on how rare calls to Add and Remove are you can even consider removing the lock in both of them because all its doing now is avoiding a race between Add and Remove and a potential loss of a cache update. If that scenario is very very improbable you could get away with it. That said, I very much doubt the few nanoseconds an uncontended lock takes is a relevant factor here to actually consider this ;)

SomeImmutableCollection can be any of the collections found in System.Collections.Immutable that better suit your needs.

Upvotes: 3

Related Questions