Dan Gravell
Dan Gravell

Reputation: 8275

How to get distinct items from a Scala Iterable, maintaining laziness

I have a java.lang.Iterable which computes its values lazily. I am accessing it from Scala. Is there a core API way of returning only distinct values? For instance, imaging there was a filter method that also provided all results returned thus far:

val myLazyDistinctIterable = iterable.filter((previousReturnedItems, newItem) => previousReturnedItems.contains(newItem))

I guess this is not a very general case because it involves storing previously returned items, and that might be why it isn't in the core API.

I know about List.distinct and Sets but I want something that will not compute its elements until asked.

Upvotes: 12

Views: 11418

Answers (5)

Rok Kralj
Rok Kralj

Reputation: 48775

Here is the code that adds .disctinct method to Iterator.

implicit class IteratorWrapper[T](it: Iterator[T]) {
    def distinct = new Iterator[T] {
        var seen = Set.empty[T]
        var ahead = Option.empty[T]

        def searchAhead {
            while (ahead.isEmpty && it.hasNext) {
                val v = it.next
                if (!seen(v)) {
                    seen += v
                    ahead = Some(v)
                }
            }
        }

        def hasNext = {
            searchAhead
            ahead.nonEmpty
        }

        def next = {
            searchAhead
            val result = ahead.get
            ahead = None
            result
        }
    }
}

Be aware that, as it is usually so with Iterators, the original iterator is not valid after calling .distinct on it.

Upvotes: 1

Mysterious Dan
Mysterious Dan

Reputation: 1326

Expanding on my comment above, but I can't test it right now:

def unique[A](it: Iterator[A]): Iterator[A] = {
  val seen = mutable.Set[A]()
  it.filter { a =>
    if (seen(a))
      false
    else {
      seen += a
      true
    }
  }
}

You get the idea, at least. You would then apply this to the iterator you get from your iterable, and not get the unnecessary storage behavior of Stream.

Upvotes: 1

gzm0
gzm0

Reputation: 14842

This should do the job (but I hate):

class UniqueIterable[T](i: Iterable[T]) extends Iterable[T] {
  import scala.collection.mutable.Set
  def iterator = new Iterator[T] {
    val it = i.iterator
    var nextE: Option[T] = None
    val seen: Set[T] = Set.empty
    def hasNext = {
      popNext()
      nextE.isDefined
    }
    def next = {
      popNext()
      val res = nextE.get
      nextE = None
      res
    }

    @tailrec
    private def popNext() {
      if (nextE.isEmpty && it.hasNext) {
        val n = it.next
        if (seen contains n) popNext()
        else {
          seen += n
          nextE = Some(n)
        }
      }
    }
  }
}

Upvotes: 0

Travis Brown
Travis Brown

Reputation: 139058

You can use the distinct method on Stream. For example, if you have this Iterable:

val it = new java.lang.Iterable[Int] {
  def iterator = new java.util.Iterator[Int] {
    var i = 0
    var first = true

    def hasNext = true
    def next =
      if (first) { first = false; i } else { first = true; i += 1; i - 1 }
    def remove() { throw new UnsupportedOperationException("Can't remove.") }
  }
}

You can write:

scala> import scala.collection.JavaConverters._
import scala.collection.JavaConverters._

scala> val s = it.asScala.toStream
s: scala.collection.immutable.Stream[Int] = Stream(0, ?)

scala> s.take(10).toList
res0: List[Int] = List(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)

scala> val s = it.asScala.toStream.distinct
s: scala.collection.immutable.Stream[Int] = Stream(0, ?)

scala> s.take(10).toList
res1: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

We can tell that everything is appropriately lazy since the stream is infinite.

Upvotes: 12

gzm0
gzm0

Reputation: 14842

UPDATE Reading questions carefully is good. No laziness in this solution. Sorry.

toSet will do exactly what you want:

  1. Store iterated elements in a collection (not what you want but required)
  2. Drop / Replace duplicates

Example

val it = Seq(1,2,3,4,2,4): Iterable[Int]
it.toSet
// Set(1,2,3,4)

If you feel fancy, you can convert that back to an iterable:

it.toSet.toIterable

Or, pimp the Iterable:

implicit class UniquableIterable[T](t: Iterable[T]) {
  def unique = t.toSet.toIterable
}

And then call

it.unique

Upvotes: 8

Related Questions