Vadim Samokhin
Vadim Samokhin

Reputation: 3456

scala Stream transformation and evaluation model

Consider a following list transformation:

List(1,2,3,4) map (_ + 10) filter (_ % 2 == 0) map (_ * 3)

It is evaluated in the following way:

List(1, 2, 3, 4) map (_ + 10) filter (_ % 2 == 0) map (_ * 3)
List(11, 12, 13, 14) filter (_ % 2 == 0) map (_ * 3)
List(12, 14) map (_ * 3)
List(36, 42)

So there are three passes and with each one a new list structure created.

So, the first question: can Stream help to avoid it and if yes -- how? Can all evaluations be made in a single pass and without additional structures created?

Isn't the following Stream evaluation model correct:

Stream(1, ?) map (_ + 10) filter (_ % 2 == 0) map (_ * 3)
Stream(11, ?) filter (_ % 2 == 0) map (_ * 3)
// filter condition fail, evaluate the next element
Stream(2, ?) map (_ + 10) filter (_ % 2 == 0) map (_ * 3)
Stream(12, ?) filter (_ % 2 == 0) map (_ * 3)
Stream(12, ?) map (_ * 3)
Stream(36, ?)
// finish

If it is, then there are the same number of passes and the same number of new Stream structures created as in the case of a List. If it is not -- then the second question: what is Stream evaluation model in particularly this type of transformation chain?

Upvotes: 1

Views: 660

Answers (4)

Keith Pinson
Keith Pinson

Reputation: 1725

Scala can filter and transform a collection in a variety of ways.

First your example:

List(1,2,3,4) map (_ + 10) filter (_ % 2 == 0) map (_ * 3)

Could be optimized:

List(1,2,3,4) filter (_ % 2 == 0) map (v => (v+10)*3)

Or, folds could be used:

List(1,2,3,4).foldLeft(List[Int]()){ case (a,b) if b % 2 == 0 => a ++ List((b+10)*3) case (a,b) => a }

Or, perhaps a for-expression:

for( v <- List(1,2,3,4); w=v+10 if w % 2 == 0 ) yield w*3

Or, maybe the clearest to understand, a collection:

List(1,2,3,4).collect{ case v if v % 2 == 0 => (v+10)*3 }

But to address your questions about Streams; Yes, streams can be used and for large collections where what is wanted is often found early, a Stream is a good choice:

def myStream( s:Stream[Int] ): Stream[Int] = 
  ((s.head+10)*3) #:: myStream(s.tail.filter( _ % 2 == 0 ))

myStream(Stream.from(2)).take(2).toList  // An infinitely long list yields
                                         //   36 & 42 where the 3rd element
                                         //   has not been processed yet

With this Stream example the filter is only applied to the next element as it is needed, not to the entire list -- good thing, or it would never stop :)

Upvotes: 0

jwvh
jwvh

Reputation: 51271

One way to avoid intermediate collections is to use view.

List(1,2,3,4).view map (_ + 10) filter (_ % 2 == 0) map (_ * 3)

It doesn't avoid every intermediate, but it can be useful. This page has lots of info and is well worth the time.

Upvotes: 3

Eastsun
Eastsun

Reputation: 18869

No, you can't avoid it by using Stream. But you do can avoid it by using the method collect, and you should keep the idea that everytime you use a map after filter you may need a collect. Here is the code:

scala> def time(n: Int)(call : => Unit): Long = {
     |   val start = System.currentTimeMillis
     |   var cnt = n
     |   while(cnt > 0) {
     |     cnt -= 1
     |     call
     |   }
     |   System.currentTimeMillis - start
     | }
time: (n: Int)(call: => Unit)Long

scala> val xs = List.fill(10000)((math.random * 100).toInt)
xs: List[Int] = List(37, 86, 74, 1, ...)
scala> val ys = Stream(xs :_*)
ys: scala.collection.immutable.Stream[Int] = Stream(37, ?)

scala> time(10000){ xs map (_+10) filter (_%2 == 0) map (_*3) }
res0: Long = 7182

//Note call force to evaluation of the whole stream.
scala> time(10000){ ys map (_+10) filter (_%2 == 0) map (_*3) force } 
res1: Long = 17408

scala> time(10000){ xs.view map (_+10) filter (_%2 == 0) map (_*3) force }
res2: Long = 6322

scala> time(10000){ xs collect { case x if (x+10)%2 == 0 => (x+10)*3 } }
res3: Long = 2339

Upvotes: 2

Sascha Kolberg
Sascha Kolberg

Reputation: 7162

As far as I know, If you always iterate through the whole collection Stream does not help you. It will create the same number as new Streams as with the List.

Correct me if I am wrong, but I understand it as follows:

Stream is a lazy structure, so when you do:

val result = Stream(1, ?) map (_ + 10) filter (_ % 2 == 0) map (_ * 3)

the result is another stream that links to the results of the previous transformations. So if force evaluation with a foreach (or e.g. mkString)

result.foreach(println)

for each iteration the above chain is evaluated to get the current item.

However, you can reduce passes by 1, if you replace filter with withFilter. Then the filter is kind of applied with the map function.

List(1,2,3,4) map (_ + 10) withFilter (_ % 2 == 0) map (_ * 3)

You can reduce it to one pass with flatMap:

List(1,2,3,4) flatMap { x =>
  val y = x + 10
  if (y % 2 == 0) Some(y * 3) else None
}

Upvotes: 0

Related Questions