cen0r
cen0r

Reputation: 188

Scala groupBy+map issue

Live example here: Scastie Example

I don't understand how this works. I have a Seq of tuples like so:

val v = Seq(
  ("[email protected]",2), 
  ("[email protected]",2), 
  ("[email protected]",9), 
  ("[email protected]",10)
)

I want to group them like so:

v.groupBy{ case(email, id) => id }

This results in:

Map(
  2 -> List(
      ([email protected],2),
      ([email protected],2)
  ), 
  10 -> List(
      ([email protected],10)
  ), 
  9 -> List(
      ([email protected],9)
  )
)

Which makes perfect sense, but now if i map them again like so:

v.groupBy{ case(email, id) => id}.map{case(id, data) => data.head}.toSeq

I expect the result to be:

Vector(([email protected],2), ([email protected],10), ([email protected],9))

However I get:

Vector(([email protected],9))

Whats wrong?

Upvotes: 3

Views: 103

Answers (4)

RAGHHURAAMM
RAGHHURAAMM

Reputation: 1099

That was wrong because you invoked a map method on Map object as already pointed out by Andrey Tyukin. Convert it to list first and then apply map method with appropriate conversion function as below :
This works:

 v.groupBy{ case(email, id) => id}.toList.map(_._2.head)

Upvotes: 0

Simon
Simon

Reputation: 6363

When you do groupBy you get a Map[Int, Seq[(String, Int)]]. The map method will operate on each entry in your Map[Int, Seq[(String, Int)]]. If you just want to operate on the values you can do

v.groupBy{ case(email, id) => id}.mapValues(...

Upvotes: 0

Yuval Itzchakov
Yuval Itzchakov

Reputation: 149538

This is indeed a bit confusing. This happens because map on Map[K, V] also returns a Map[K', V'] pair, and because your keys are all the same (the mailing address), you get back only a single value.

This can be avoided using .values which returns an Iterable of the values in the Map, and then .map:

v
 .groupBy { case (_, id) => id }
 .values
 .map(_.head)
 .toList

Upvotes: 2

Andrey Tyukin
Andrey Tyukin

Reputation: 44918

This happens when you carelessly invoke map on a Map. In this case, the pairs

  2 -> List(
      ([email protected],2),
      ([email protected],2)
  ), 
  10 -> List(
      ([email protected],10)
  ), 
  9 -> List(
      ([email protected],9)
  )

are transformed into pairs

([email protected],2)
([email protected],10)
([email protected],9)

and then again inserted into the freshly constructed map, overriding the value 2 by 10 then by 9. The final result is a map of type Map[String, Int] with a single entry ([email protected],9), which is of course not what you wanted.

Do this instead:

println(v.groupBy{ case(email, id) => id}.toSeq.map{case(id, data) => data.head})

Upvotes: 2

Related Questions