Reputation: 8275
I have a java.lang.Iterable
which computes its values lazily. I am accessing it from Scala. Is there a core API way of returning only distinct values? For instance, imaging there was a filter method that also provided all results returned thus far:
val myLazyDistinctIterable = iterable.filter((previousReturnedItems, newItem) => previousReturnedItems.contains(newItem))
I guess this is not a very general case because it involves storing previously returned items, and that might be why it isn't in the core API.
I know about List.distinct
and Set
s but I want something that will not compute its elements until asked.
Upvotes: 12
Views: 11418
Reputation: 48775
Here is the code that adds .disctinct
method to Iterator
.
implicit class IteratorWrapper[T](it: Iterator[T]) {
def distinct = new Iterator[T] {
var seen = Set.empty[T]
var ahead = Option.empty[T]
def searchAhead {
while (ahead.isEmpty && it.hasNext) {
val v = it.next
if (!seen(v)) {
seen += v
ahead = Some(v)
}
}
}
def hasNext = {
searchAhead
ahead.nonEmpty
}
def next = {
searchAhead
val result = ahead.get
ahead = None
result
}
}
}
Be aware that, as it is usually so with Iterators, the original iterator is not valid after calling .distinct
on it.
Upvotes: 1
Reputation: 1326
Expanding on my comment above, but I can't test it right now:
def unique[A](it: Iterator[A]): Iterator[A] = {
val seen = mutable.Set[A]()
it.filter { a =>
if (seen(a))
false
else {
seen += a
true
}
}
}
You get the idea, at least. You would then apply this to the iterator you get from your iterable, and not get the unnecessary storage behavior of Stream
.
Upvotes: 1
Reputation: 14842
This should do the job (but I hate):
class UniqueIterable[T](i: Iterable[T]) extends Iterable[T] {
import scala.collection.mutable.Set
def iterator = new Iterator[T] {
val it = i.iterator
var nextE: Option[T] = None
val seen: Set[T] = Set.empty
def hasNext = {
popNext()
nextE.isDefined
}
def next = {
popNext()
val res = nextE.get
nextE = None
res
}
@tailrec
private def popNext() {
if (nextE.isEmpty && it.hasNext) {
val n = it.next
if (seen contains n) popNext()
else {
seen += n
nextE = Some(n)
}
}
}
}
}
Upvotes: 0
Reputation: 139058
You can use the distinct
method on Stream
. For example, if you have this Iterable
:
val it = new java.lang.Iterable[Int] {
def iterator = new java.util.Iterator[Int] {
var i = 0
var first = true
def hasNext = true
def next =
if (first) { first = false; i } else { first = true; i += 1; i - 1 }
def remove() { throw new UnsupportedOperationException("Can't remove.") }
}
}
You can write:
scala> import scala.collection.JavaConverters._
import scala.collection.JavaConverters._
scala> val s = it.asScala.toStream
s: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> s.take(10).toList
res0: List[Int] = List(0, 0, 1, 1, 2, 2, 3, 3, 4, 4)
scala> val s = it.asScala.toStream.distinct
s: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> s.take(10).toList
res1: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
We can tell that everything is appropriately lazy since the stream is infinite.
Upvotes: 12
Reputation: 14842
UPDATE Reading questions carefully is good. No laziness in this solution. Sorry.
toSet
will do exactly what you want:
Example
val it = Seq(1,2,3,4,2,4): Iterable[Int]
it.toSet
// Set(1,2,3,4)
If you feel fancy, you can convert that back to an iterable:
it.toSet.toIterable
Or, pimp the Iterable
:
implicit class UniquableIterable[T](t: Iterable[T]) {
def unique = t.toSet.toIterable
}
And then call
it.unique
Upvotes: 8