kafman
kafman

Reputation: 2860

Scala map() on a Map[..] much slower than mapValues()

In a Scala program I wrote I have a scala.collection.Map that maps a String to some calculated values (in detail it's Map[String, (Double, immutable.Map[String, Double], Double)] - I know that's ugly and should (and will be) wrapped). Now, if I do this:

stats.map { case(c, (prior, pwc, denom)) => {
  println(c)
  ...
  }
}

it takes about 30 seconds to print out roughly 50 times a value of c! The println is just a test statement - the actual calculation I need was even slower (I aborted after 1 minute of complete silence). However, if I do it like this:

stats.mapValues { case (prior, pwc, denom) => {
  println(prior)
  ...
  }
}

I don't run into these performance issues ... Can anyone explain why this is happening? Am I not following some important Scala guidelines?

Thanks for the help!

Edit:

I further investigated the behaviour. My guess is that the bottleneck comes from accessin the Map datastructure. If I do the following, I have have the same performance issues:

classes.foreach{c => {
  println(c)
  val ps = stats(c)
  }
}

Here classes is a List[String] that stores the keys of the Map externally. Without the access to stats(c) no performance losses occur.

Upvotes: 2

Views: 630

Answers (1)

Ben Reich
Ben Reich

Reputation: 16324

mapValues actually returns a view on the original map, which can lead to unexpected performance issues. From this blog post:

...here is a catch: map and mapValues are different in a not-so-subtle way. mapValues, unlike map, returns a view on the original map. This view holds references to both the original map and to the transformation function (here (_ + 1)). Every time the returned map (view) is queried, the original map is first queried and the tranformation function is called on the result.

I recommend reading the rest of that post for some more details.

Upvotes: 3

Related Questions