Ralph
Ralph

Reputation: 32284

Convert a mutable Map of Seq to an immutable Map of IndexedSeq in Scala

I need to process a large number of records (several million) representing people. I would like to create a partition based on the year-of-birth, and then process each group separately. I am trying to create a functional solution (no/minimal mutable data), so that it will be thread-safe and can be parallelized.

For my first attempt, I created a tail-recursive function that builds a Map[Int, IndexedSeq] that maps each year-of-birth to a sequence of people records. I need an indexed sequence because I will be doing random accesses to the people in each group. Here is my code:

@tailrec
def loop(people: Seq[Person],
         map: Map[Int, IndexedSeq[Person]] = Map()): Map[Int, IndexedSeq[Person]] = {
  if (people.isEmpty) map
  else {
    val person = people.head
    val yearOfBirth = person.yearOfBirth
    val seq = map.getOrElse(yearOfBirth, IndexedSeq())
    loop(people.tail, map + (yearOfBirth -> (seq :+ person)))
  }
}

This works, but is not very efficient. I can do better by allowing a small amount of very localized mutability. If all of the mutable variables are on the stack, the code will still be thread-safe, as long as the output Map is immutable.

I would like to implement this by internally building a mutable Map[Int, List[Person]] and then efficiently converting it to an immutable Map[Int, IndexedSeq[Person]] as the return value.

How can I convert the mutable Map of List items to an immutable Map[Int, IndexedSeq[Person]] in the most efficient manner possible? Note that there is no particular order to the people in each year-of-birth group.

Upvotes: 1

Views: 762

Answers (1)

Nicolas
Nicolas

Reputation: 24769

Why don't you use the groupByfunction of the Seq trait? (documentation is here: http://www.scala-lang.org/api/current/index.html#scala.collection.Seq)

def groupByYearOfBirth(people: Seq[Person]) = people.groupBy(_.yearofBirth)

Edit: contrary to my initial proposition, don't use .mapValues(_.toIndexedSeq) to provide anIndexedSeq`. Daniel explains why in a comment below.

Upvotes: 6

Related Questions