ArrayList.remove() (removeAt() in Kotlin) causes weird IndexOutOfBoundsException

Question

My code is simple but causes a weird IndexOutOfBoundsException.

It happens only once in ten thousands, but it happens. I can see them only in the Firebase Crashlytics Report.

Could you see the problem or should I ignore it?

private val mTaskList = ArrayList()
// a element is added to the list.
// The list usually contains only one element.
private var worker: TaskWorker? = null
...
// Only one TaskWorker can exist.
  if(worker==null){
    Thread(TaskWorker().also{
      worker = it
    }).start()
  }
...

inner class TaskWorker : Runnable {
  override fun run() {
    while (mTaskList.size > 0) {
      TaskList.removeAt(0).run {  // causes java.lang.ArrayIndexOutOfBoundsException: length=10; index=-1 (or Index: 0, Size: 0)
        ...
      }
    }
    worker = null // Only one TaskWorker can exist.
  }
}

TaskList.removeAt(0) causes the execptions. (I can see them only in the Firebase Crashlytics report)

Fatal Exception: java.lang.ArrayIndexOutOfBoundsException: length=10; index=-1

or

Fatal Exception: java.lang.ArrayIndexOutOfBoundsException: Index: 0, Size: 0

Stacktrace is like this

java.util.ArrayList.remove (ArrayList.java:503)
.MainActivity$TaskWorker.run (MainActivity.kt:452)
java.lang.Thread.run (Thread.java:923)

tyg · Accepted Answer

The reason for the IndexOutOfBoundsException will most likely be a race condition, meaning that multiple threads compete for a single resource, namely the access to mTaskList.

When two threads run simultaneously, they both can check that mTaskList.size > 0 is true why they then both continue to execute TaskList.removeAt(0) (which I assume is actually meant to be mTaskList.removeAt(0)). The first one succeeds, the second one throws the exception because there are not elements left to remove.

Assuming there is no other write access to that array except by TaskWorker, the culprit will most likely be the initialization of the worker thread. There are four non-atomic operations that are relevant. non-atomic means that another thread can interfere because they are not executed as an inseparable unit:

if (worker == null)
TaskWorker()
worker = it
Thread::start()

It's an even worse race condition than what you have with your array: Two threads can both check for (1.) and then falsly assume they can safely execute the rest. There will be two TaskWorker instances (2.), but only the second one will be accessible by the variable worker (3.). The first one, though, still lives on in the first Thread object and both Threads will be started concurrently with their own TaskWorker instance (4.) which is in turn the cause for the race condition with mTaskList.

There are several ways how this can be solved. As @broot suggested in the comments you can make sure that multiple calls to the initialization of TaskWorker as well as the access to the array are only ever executed on the same thread. That way it is guaranteed that no other thread can interfere. The easiest way to achieve this is to leave the calls on the main thread. However, that may not be an option if the load is too heavy. In that case you could confine the calls to a specific thread by defining a new dispatcher with newSingleThreadContext. See https://kotlinlang.org/docs/shared-mutable-state-and-concurrency.html#thread-confinement-coarse-grained for more.

Another way would be to rely on the structured concurrency of Kotlin coroutines and refrain from creating your own threads in the first place. Replace your TaskWorker class with this:

suspend fun executeTasks() = withContext(Dispatchers.Default) {
    while (true) {
        ensureActive()
        val task = taskListMutex.withLock { mTaskList.removeFirstOrNull() } ?: break
        // do whatever you need to do with the task
    }
}

For this to work you also need a Mutex instance, probably best located where mTaskList also is:

private val taskListMutex = Mutex()

Now, instead of creating a Thread to start the task execution you can simply call executeTasks(). I chose Dispatchers.Default because I assumed the tasks have a heavy computational load. If it is IO related you should use Dispatchers.IO instead. Or you leave all of that up to the task execution itself and remove the withContext at this point alltogether.

What executeTasks effectively does different to your code is to synchronize the access to mTaskList with a Mutex. A Mutex provides a simple locking mechanism where it is guaranteed that the code wrapped with withLock can only ever be executed one at a time (that is, atomically). Since withLock is a suspend function, all other coroutines trying to also run the code will suspend and wait for the currently running coroutine to finish before they can execute the wrapped code.

If called concurrently from different coroutines instead of the race condition leading to an IndexOutOfBoundsException you will end up with multiple separate task queues being executed in parallel with the mutex synchronizing their access to the list. Since "the list usually contains only one element" this won't have much impact, though. If, however, you need a guarantee that no two tasks may be executed at the same time, you should wrap the entire executeTasks in the Mutex's withLock.

With the Mutex in place, you should also wrap all other access to the list in a withLock block (for the same Mutex, obviously), especially the code that inserts new elements into the list.

ArrayList.remove() (removeAt() in Kotlin) causes weird IndexOutOfBoundsException

Answers (1)

Related Questions