Why does this scala function run so much faster using a map compared to a list?

Question

I have implemented Papadimitriou's Algorithmn to solve the two-satisfiability problem in Scala. In order to increase the speed of my algorithm I do a reduction step first which eliminates all clauses where one of the values in the clause only appears either (xor) negated or non-negated throughout the entire clause set.

The algorithm is a local search algorithm that requires a random initial state to be set every loop. Rather than being completely random, I know the state of certain bits from the reduction step. So my code generates an array buffer of the initial state, with its inputs being the total length of the bit set, and a list of the constraints calculated from the reduction step.

Initially my implementation just used the list to check whether a constrain existed for a specific bit, or whether I need to randomly generate the bit:

// Is bit number x false (negX = true)?
case class Constraint(negX: Boolean, x: Int) extends Ordered[Constraint] {
  def compare(that: Constraint) = this.x - that.x
}

private def getInitial(len: Int, allConstraints: List[Constraint]): ArrayBuffer[Boolean] = {
  var constraints = allConstraints.sorted
  val buff = ArrayBuffer.fill(len)(false)
  for (i <- 0 to (len - 1)) {
    if (constraints.length > 0 && constraints.head.x == i) {
      buff(i) = !constraints.head.negX
      constraints = constraints.tail
    } else {
      buff(i) = math.random < 0.5
    }
  }
  buff
}

The above works perfectly, but after analysing my code I found that it was quite slow. I know lookup of lists are usually slow due to you having to traverse the entire list every time you want to search for a value, but I thought as I sorted the list and only was checking the head that it wouldn't be too slow.

Anyway, I tried a second implementation using map:

private def getInitial(len: Int, allConstraints: List[Constraint]): ArrayBuffer[Boolean] = {
  var constraintMap = allConstraints.map{ case Constraint(negX,x) => (x,negX)}.toMap
  val buff = ArrayBuffer.fill(len)(false)
  for (i <- 0 to (len - 1)) {
    buff(i) = !constraintMap.get(i).getOrElse(math.random < 0.5)
  }
  buff
}

To my surprise this is noticeably faster. So my question is why?

I have six different data sets so I tried to do some time profiling on the smallest dataset (100,000 variables/clauses) and the largest one (1,000,000 variables/clauses). Doing multiple runs I got the following best times (the times do vary due to the nature of the algorithm being random, the best times will have less repeats and so the time generating the initial condition will dominate the time)

/* Small dataset */
// Using List
( scala target/scala-2.13/papadimitriou-s-algorithmn_2.13-1.0.jar main1; )  9.71s user 0.16s system 116% cpu 8.451 total
// Using Map
( scala target/scala-2.13/papadimitriou-s-algorithmn_2.13-1.0.jar main1; )  3.94s user 0.14s system 301% cpu 1.351 total

/* Large dataset */
// Using List
// ...never finished within 15 minutes
// Using Map
( scala target/scala-2.13/papadimitriou-s-algorithmn_2.13-1.0.jar main6; )  72.85s user 1.70s system 428% cpu 17.384 total

Dima · Accepted Answer

The correct answer is in the comments, but an esthete in me can't refrain from noting how much nicer this would look in a actual idiomatic scala :)

def getInitial(len: Int, allConstraints: List[Constraint]) = {
  @tailrec
  def loop(i: Int, constraints: List[Constraint], result: List[Boolean]): List[Boolean] = 
    (i, constraints) match {
      case (0, _) => result.reversed
      case (_, head :: tail) if head.x == i => loop(i - 1, tail, !head.negX :: result)
      case _ => loop(i-1, constraints, Random.nextBoolean :: result)
    }

  loop(len-1, allConstraints.sorted.reversed)
}

I think, you can also do it without sorting if you use a prefilled array as the result (mutating array elements is yucky, but if performance is important, and size of constraints is large, this would shave a few cycles:

def getInitial(len: Int, allConstraints: List[Constraint]) = {
  val result = Array.fill(len)(Random.nextBoolean)
  allConstraints.foreach { c => result(c.x) = !c.negX }
  result
}

Why does this scala function run so much faster using a map compared to a list?

Answers (1)

Related Questions