uzilan
uzilan

Reputation: 2624

Why is Stream.distinct not distinct?

I'm trying the following code and am surprised to find duplicates. I thought that Stream.distinct would only return distinct values. What am I doing wrong here?

import scala.collection.mutable
import scala.util.Random.shuffle

object DistinctStreamTest {
   def rand(): Stream[Int] = Stream.cons(shuffle(1 to 1000).head, rand()).distinct

   def main(args: Array[String]): Unit = {
      val usedNumbers = mutable.HashSet.empty[String]
      (0 to 100).foreach(i => {
         val newNumber = rand().take(1).mkString
         if (usedNumbers.contains(newNumber)) {
            println(s"Found duplicates: $newNumber after $i iterations")
         }
         usedNumbers += newNumber
      })
   }
}

Upvotes: 1

Views: 554

Answers (2)

jwvh
jwvh

Reputation: 51271

The reason you're getting duplicates is that you've defined rand() as a def. That means that every invocation creates a new and different Stream[Int]. You're only taking the 1st element from each new Stream but the likelihood of repetition is pretty high.

You could fix it by making rand a val instead, but you've defined it recursively and so trying to access anything after the head will cause an infinite recursion.

The proper way to get a random sequence of distinct values:

val sq :Seq[Int] = util.Random.shuffle(1 to 1000)

Then to access the values you might:

(0 to 100).foreach(i => {
  val newNumber = sq(i).toString()
  ... //etc.

Upvotes: 3

erip
erip

Reputation: 16935

You've defined rand() to be an infinitely recursive Stream; since it cannot be materialized, it cannot be distinctified.

scala> def rand(): Stream[Int] = Stream.cons(shuffle(1 to 1000).head, rand()).distinct
rand: ()Stream[Int]

scala> rand().toList.length
java.lang.StackOverflowError
  at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:61)
  at scala.collection.IterableLike.copyToArray(IterableLike.scala:252)
  at scala.collection.IterableLike.copyToArray$(IterableLike.scala:247)
  at scala.collection.AbstractIterable.copyToArray(Iterable.scala:54)
  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:99)
  at scala.util.Random.shuffle(Random.scala:108)

Streams are lazy, so taking things from them is fine without blowing the stack.

Upvotes: 1

Related Questions