Reputation: 617
Let's say I want to print duplicates
in a list with their count
. So I have 3 options
as shown below:
def dups(dup:List[Int]) = {
//1)
println(dup.groupBy(identity).collect { case (x,ys) if ys.lengthCompare(1) > 0 => (x,ys.size) }.toSeq)
//2)
println(dup.groupBy(identity).collect { case (x, List(_, _, _*)) => x }.map(x => (x, dup.count(y => x == y))))
//3)
println(dup.distinct.map((a:Int) => (a, dup.count((b:Int) => a == b )) ).filter( (pair: (Int,Int) ) => { pair._2 > 1 } ))
}
Questions:
-> For option 2
, is there any way to name the list parameter so that it can be used to append the size of the list just like I did in option 1
using ys.size?
-> For option 1
, is there any way to avoid the last call to toSeq to return a List?
-> which one of the 3 choices is more efficient by using the least amount of loops
?
As an example input: List(1,1,1,2,3,4,5,5,6,100,101,101,102) Should print: List((1,3), (5,2), (101,2))
Based on @lutzh answer below the best way would be to do the following:
val list: List[(Int, Int)] = dup.groupBy(identity).collect({ case (x, ys @ List(_, _, _*)) => (x, ys.size) })(breakOut)
val list2: List[(Int, Int)] = dup.groupBy(identity).collect { case (x, ys) if ys.lengthCompare(1) > 0 => (x, ys.size) }(breakOut)
Upvotes: 1
Views: 13451
Reputation: 4965
For option 1 is there any way to avoid the last call to toSeq to return a List?
collect
takes a CanBuildFrom
, so if you assign it to something of the desired type you can use breakOut:
import collection.breakOut
val dups: List[(Int,Int)] =
dup
.groupBy(identity)
.collect({ case (x,ys) if ys.size > 1 => (x,ys.size)} )(breakOut)
collect
will create a new collection (just like map
), using a Builder
. Usually the return type is determined by the origin type. With breakOut you basically ignore the origin type and look for a builder for the result type. So when collect
creates the resulting collection, it will already create the "right" type, and you don't have to traverse the result again to convert it.
For option 2, is there any way to name the list parameter so that it can be used to append the size of the list just like I did in option 1 using ys.size?
Yes, you can bind it to a variable with @
val dups: List[(Int,Int)] =
dup
.groupBy(identity)
.collect({ case (x, ys @ List(_, _, _*)) => (x, ys.size) } )(breakOut)
which one of the 3 choices is more efficient?
Calling dup.count on a match seems inefficient, as dup needs to be traversed again then, I'd avoid that.
My guess would be that the guard (if lengthCompare(1) > 0) takes a few cycles less than the List(,,_*) pattern, but I haven't measured. And am not planning to.
Disclaimer: There may be a completely different (and more efficient) way of doing it that I can't think of right now. I'm only answering your specific questions.
Upvotes: 2