BorisOkunskiy
BorisOkunskiy

Reputation: 1828

Scala: fastest `remove(i: Int)` in mutable sequence

Which implementation from scala.collection.mutable package should I take if I intend to do lots of by-index-deletions, like remove(i: Int), in a single-threaded environment? The most obvious choice, ListBuffer, says that it may take linear time depending on buffer size. Is there some collection with log(n) or even constant time for this operation?

Upvotes: 5

Views: 5178

Answers (3)

Sheng
Sheng

Reputation: 1707

Java's ArrayList effectively has constant time complexity if the last element is the one to be removed. Look at the following snippet copied from its source code,

        int numMoved = size - index - 1;
        if (numMoved > 0)
            System.arraycopy(elementData, index+1, elementData, index,
                             numMoved);
        elementData[--size] = null; // clear to let GC do its work

As you can see, if numMoved is equal to 0, remove will not shift and copy the array at all. This in some scenarios can be quite useful. For example, if you do not care about the ordering that much, to remove an element, you can always swap it with the last element, and then delete the last element from the ArrayList, which effectively makes the remove operation all the way constant time. I was hoping ArrayBuffer would do the same, unfortunately that is not the case.

Upvotes: 1

Mike
Mike

Reputation: 1849

Depending on your exact use case, you may be able to use LinkedHashMap from scala.collection.mutable.

Although you cannot remove by index, you can remove by a unique key in constant time, and it maintains a deterministic ordering when you iterate.

scala> val foo = new scala.collection.mutable.LinkedHashMap[String,String]
foo: scala.collection.mutable.LinkedHashMap[String,String] = Map()

scala> foo += "A" -> "A"
res0: foo.type = Map((A,A))

scala> foo += "B" -> "B"
res1: foo.type = Map((A,A), (B,B))

scala> foo += "C" -> "C"
res2: foo.type = Map((A,A), (B,B), (C,C))

scala> foo -= "B"
res3: foo.type = Map((A,A), (C,C))

Upvotes: 1

Eugene Yokota
Eugene Yokota

Reputation: 95624

Removal operators, including buf remove i, are not part of Seq, but it's actually part of Buffer trait under scala.mutable. (See Buffers)

See the first table on Performance Characteristics. I am guessing buf remove i has the same characteristic as insert, which are linear for both ArrayBuffer and ListBuffer. As documented in Array Buffers, they use arrays internally, and Link Buffers use linked lists (that's still O(n) for remove).

As an alternative, immutable Vector may give you an effective constant time.

Vectors are represented as trees with a high branching factor. Every tree node contains up to 32 elements of the vector or contains up to 32 other tree nodes. [...] So for all vectors of reasonable size, an element selection involves up to 5 primitive array selections. This is what we meant when we wrote that element access is "effectively constant time".

scala> import scala.collection.immutable._
import scala.collection.immutable._

scala> def remove[A](xs: Vector[A], i: Int) = (xs take i) ++ (xs drop (i + 1))
remove: [A](xs: scala.collection.immutable.Vector[A],i: Int)scala.collection.immutable.Vector[A]

scala> val foo = Vector(1, 2, 3, 4, 5)
foo: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4, 5)

scala> remove(foo, 2)
res0: scala.collection.immutable.Vector[Int] = Vector(1, 2, 4, 5)

Note, however, a high constant time with lots of overhead may not win a quick linear access until the data size is significantly large.

Upvotes: 8

Related Questions