yi1
yi1

Reputation: 413

Efficient building of Set in Scala, without o(n^2), if I know there are no duplicates

I am required to pass a Set in Scala to a series of functions. I don't have a choice, it must be a Set.

I receive a really big List of JSON values:

scala.List[JsonAST.JValue]

The naive way would be to use toSet:

val input : scala.List[JsonAST.JValue]
val mySet = input.toSet

But this can be very expensive from a performance perspective. The way toSet works is it starts with the first element in the list, and then adds the following ones one-by-one. Each time, it scans the Set to ensure the value isn't already there.

Let's assume I know for a fact that there aren't any duplicates. Is there a way for me to create a Set more efficiently? Ideally, just copying the List into the Set. No changes in order or contents.

Upvotes: 0

Views: 223

Answers (2)

Régis Jean-Gilles
Régis Jean-Gilles

Reputation: 32719

As ziggystar pointed out, you can only have O(1) performance when converting a List to a Set if you are willing to use an adapter, and then trade fast conversion (O(1)) to a set with poor performance when calling contains (O(n)).

If that is an acceptable trade off for your use case, you can use something like this:

import scala.collection.AbstractSet

class SetAdapter[A](val self: Seq[A]) extends AbstractSet[A] {
  def +(elem: A): SetAdapter[A] = if (self.contains(elem)) this else new SetAdapter(self :+ elem)
  def -(elem: A): SetAdapter[A] = new SetAdapter(self.filterNot(_ == elem))
  def contains(elem: A): Boolean = self.contains(elem)
  // assumes that are not duplicates in self, otherwise it will 
  // incorrectly iterate several times over equal values, which a set is not supposed to do
  def iterator: Iterator[A] = self.iterator
}

implicit class AsSetOps[A](val seq: Seq[A]) extends AnyVal {
  def asSet() = new SetAdapter[A](seq)
}

Quick REPL test to check that it works as expected:

scala> val s = l.asSet
s: SetAdapter[Int] = Set(1, 3, 4, 8, 12)

scala> s + 6
res8: SetAdapter[Int] = Set(1, 3, 4, 8, 12, 6)

scala> s - 8
res9: SetAdapter[Int] = Set(1, 3, 4, 12)

In your example, all you'd have to do then is this:

val input : scala.List[JsonAST.JValue]
val mySet = input.asSet

Upvotes: 1

ziggystar
ziggystar

Reputation: 28690

Use a mutable.HashSet

If the set is represented as a tree, which it most probably is, then the complexity of building the set is only O(n·log n). Otherwise use a mutable.HashSet for building, which results in amortized O(n).

You can achieve this with list.to[mutable.HashSet].

O(1) conversion (constant time)

You cannot build a Set from a List in O(1), as you should have to visit each element at least once. So O(n) is the fastest in a reasonable sense. You could have some Set-wrapper around the List. But then access is not fast anymore, as it delegates to List.elementAt().

Upvotes: 2

Related Questions