Irina Rapoport
Irina Rapoport

Reputation: 1692

How do I efficiently count distinct fields in a collection?

I am currently doing this:

val count = sightings.map(_.shape).distinct.length

However, map creates an intermediary collection, which in my case is a Vector thousands of times larger than what distinct produces.

How do I bypass this intermediate step and get the set of distinct shapes? Or, even better, the count of distinct shapes.

Upvotes: 1

Views: 543

Answers (3)

stefanobaghino
stefanobaghino

Reputation: 12814

You can use an iterator to not create the intermediate collection and then accrue the shapes in a Set to get the distinct ones:

val count = sightings.iterator.map(_.shape).toSet.size

Alternatively, you can use collection.breakOut to accrue the items in a Set without creating the intermediate collection (another answer suggested using breakOut, but in a different way):

val distinctShapes: Set[Shape] = sightings.map(_.shape)(collection.breakOut)
val count = distinctShapes.size

Upvotes: 4

fcat
fcat

Reputation: 1251

Apart from the other answers, there is an exact solution for your problem.

Breakoutis the key you are looking for.

Example usage:

 import scala.collection.breakOut
 val count = sightings.map(_.shape)(breakOut).distinct.length

Here, using breakOut prevents creating intermediate collections.

You can read documentation for more information.

Upvotes: 2

jwvh
jwvh

Reputation: 51271

One approach is to remove the duplicates as you go, then count the results.

sightings.foldLeft(Set[Shape]()){case (ss,sight) => ss + sight.shape}.size

The intermediate Set of shapes is only as big as all the distinct shapes encountered so far.

Upvotes: 3

Related Questions