Big McLargeHuge
Big McLargeHuge

Reputation: 16056

How to demonstrate thread safety issue

A coworker told me this Cache object needs to be made thread-safe:

import scala.collection.mutable

object Cache {
    private val data = mutable.Map[Int, String]()

    def get(key: Int): Option[String] = data.get(key)

    def set(key: Int, value: String): Unit = data(key) = value
}

This is my attempt to reproduce the issue:

def expensiveMethod(): String = {
    Thread.sleep(1)
    Thread.currentThread.getName
}

for (i <- 1 to 3) {
    new Thread(() => {
        val cachedValue = Cache.get(1)

        if (cachedValue.nonEmpty) {
            println(s"${Thread.currentThread.getName}: cached value = ${cachedValue}")
        } else {
            val newValue = expensiveMethod()
            Cache.set(1, newValue)

            val finalValue = Cache.get(1)
            println(s"${Thread.currentThread.getName}: final value = ${finalValue}")
        }
    }).start()
}

// Thread-2: final value = Some(Thread-2)
// Thread-3: final value = Some(Thread-2)
// Thread-1: final value = Some(Thread-2)

I'm trying to get my head around it. Three threads modify the data at the same time. Only one wins. Why is that a problem?

I do find it strange that in-between Thread-3 setting and reading the value, apparently Thread-2 has changed it. Is that the crux of the issue?

I tried using a lock but it didn't completely prevent that from happening, though it did seem to reduce the issue:

    def set(key: Int, value: String): Unit = data.synchronized { data(key) = value }

I also tried using concurrent.TrieMap instead of mutable.Map. It also appeared to reduce the issue but without running hundreds of tests I don't know how to be sure.

Upvotes: 0

Views: 243

Answers (2)

user1643723
user1643723

Reputation: 4192

Your code is

  1. Too simple
  2. Has too few threads

If you want to "demonstrate" this specific issue by causing a crash or wrong output (purely for entertainment purposes), you need to simultaneously modify the map from multiple threads.

Java Map is quite fast, so you will have to run a large number of Threads, each one modifying the Map in infinite while loop (starting a Thread and executing an expensiveMethod are quite slow operations, so they reduce chance of reproducing the thread-safety issue).

In addition, you need to understand the internal structure of common HashMap implementation: simply setting the same key like this

Cache.set(1, newValue)

will not actually induce structural changes to the map — it will calculate a hash code and set array value at fixed offset — as you correctly observed such code is merely racy, but not necessarily wrong and will not usually result in crashes.

On other hand, if you concurrently add and remove random keys to Cache in tight loop, it will eventually crash with 100% possibility (this will probably happen in the code that writes values or resizes the underlying storage).

Note, that simply reading values is always 100% thread-safe. The crash is induced by mutation — but can come either from get or put method after mutation happens. A read-only Map and other strictly read-only structures are always thread-safe, even without synchronization (this property is actually exploited by CopyOnWriteArrayList).


That said, you should never rely on demonstrations and reproduction to evaluate and resolve thread-safety issues. As explained above, the difference between non thread-safe code, that "works" and the one, that crashes or miscalculates, is minuscule. Today all your tests are passing. Tomorrow, someone changes a single seemingly unrelated line and everything breaks.

Worse yet, you, personally, and your coworkers, programmers, testers etc. might never be able to reproduce a bug due to element of randomness inherent to thread-safety issues. But due to the law of large numbers your users might regularly experience unpredictable crashes and malfunctions — because they are the ones who run your program regularly and for much longer periods of time than you do.

Upvotes: 1

Dima
Dima

Reputation: 40500

in-between Thread-3 setting and reading the value, apparently Thread-2 has changed it. Is that the crux of the issue?

Well, that is one of the issues. Your expensiveMethod can potentially be called several times, which is, at best, wasteful, but could potentially be more problematic if it has side-effects.

Like, if you were to keep statistics about your cache affinity, they'd be all off. Or you could cause a deadlock in the database ... etc.

A bigger (albeit, harder to reproduce on purpose) problem is Map implementation itself is not thread-safe.

This means that if one thread is doing put, while the other one is inside get, the result is undefined, and could even cause the JVM itself to crash.

To avoid the second (bigger) problem, you could just use java's ConcurrentHashMap (or concurrent.TrieMap, though, it would be less efficient). That will guarantee always having a valid (albeit, possibly outdated) result from get, and assurance against random crashes, but does nothing to ease the first issue with potential redundant calls to expensiveMethod.

To avoid that (the wasted calls), you would need to synchronize the entire block:

    val cachedValue = Cache.synchronized { 
       Cache.get(1).getOrElse { 
        val value = expensiveMethod()
        Cache.set(1, value)
        value
       }
    }

Upvotes: 1

Related Questions