DNA
DNA

Reputation: 42586

Creating an iterator of subsequences from an iterator of items

If I have a Seq, then it is easy to generate all the subsequences up to a given length limit, as follows:

def subseqs[A](n: Int)(s: Seq[A]) = {
  (1 to n).flatMap(s.sliding)
}                                            

subseqs(3)(List(1, 2, 3, 4)) foreach println    //> List(1)
                                                //| List(2)
                                                //| List(3)
                                                //| List(4)
                                                //| List(1, 2)
                                                //| List(2, 3)
                                                //| List(3, 4)
                                                //| List(1, 2, 3)
                                                //| List(2, 3, 4)

However, is there an idiomatic (and reasonably efficient) way to do the same with an iterator as input, producing an iterator (or perhaps a Stream) as output?

Updated: I have a working implementation that implements Iterator, and does this in a single pass, so needs very little memory, but is relatively long and uses mutable variables (var and ListBuffer) - can post this if it would help. I am hoping there is a more elegant way using higher-order functions...

The approach above (using sliding()) won't work because the iterator is exhausted by the first pass, and cannot be re-used.

Using a combination of sliding() and inits() is better, but misses off the tail-end of the expected subsequences:

def subseqsi[A](n: Int)(i: Iterator[A]) = {
  //(1 to n).flatMap(i.sliding) 
  // no - this exhausts the iterator

  i.sliding(n).flatMap { _.inits.filterNot(_.isEmpty) } 
  //nearly, this misses off subsequences towards the end
} 
                                              //> List(1, 2, 3)
                                              //| List(1, 2)
                                              //| List(1)
                                              //| List(2, 3, 4)
                                              //| List(2, 3)
                                              //| List(2)

My input data is an Iterator of unknown (potentially very large) size. The order of the output subsequences does not matter.

Upvotes: 3

Views: 506

Answers (2)

DNA
DNA

Reputation: 42586

Here's another approach using Streams - much less elegant than the flatMap/sliding approach (which it uses for the final few elements), but much faster and less memory-hungry.

def subseqs[A](n: Int)(s: Iterator[A]) = {
  def loop(s: Stream[A], history: Vector[A]): Stream[Seq[A]] = {
    if (s.isEmpty) (1 to n).flatMap(history.sliding).toStream
    else if (history.length == n) 
      history.inits.filterNot(_.isEmpty).toStream #::: loop(s.tail, history.tail :+ s.head)
    else loop(s.tail, history :+ s.head)
  }
  loop(s.toStream, Vector())
}

This builds up a history until there are n elements, outputs the inits, then shifts the history along. However, not only is the code uglier, it also returns the results in an ugly order, due to the need to handle the end of the stream as a special case:

subseqs(3)(List(1, 2, 3, 4)) foreach println  > Vector(1, 2, 3)
                                              | Vector(1, 2)
                                              | Vector(1)
                                              | Vector(2)
                                              | Vector(3)
                                              | Vector(4)
                                              | Vector(2, 3)
                                              | Vector(3, 4)
                                              | Vector(2, 3, 4)

I'm still hopeful that an elegant and fast solution is possible...

Upvotes: 0

kiritsuku
kiritsuku

Reputation: 53348

Just use a Stream:

def subseqs[A](n: Int)(iter: Iterator[A]) = {
  val i = iter.toStream
  1 to n flatMap i.sliding
}

A Stream is lazy as an Iterator is, but it stores all the values that are already computed.

Upvotes: 2

Related Questions