Bharath Kumar
Bharath Kumar

Reputation: 105

what is the difference between Scala Stream vs Scala List vs Scala Sequence

I have a scenario where i get DB data in the form of Stream of Objects. and while transforming it into a sequence of Object it is taking time. I am looking for alternative which takes less time.

Upvotes: 5

Views: 6090

Answers (1)

Mike Allen
Mike Allen

Reputation: 8299

Quick answer: a Scala stream is already a Scala sequence and does not need to be converted at all. Further explanation below...

A Scala sequence (scala.collection.Seq) is simply any collection that stores a sequence of elements in a specific order (the ordering is arbitrary, but element order doesn't change once defined).

A Scala list (scala.collection.immutable.List) is a subclass of Seq and is also the default implementation of a scala.collection.Seq. That is, Seq(1, 2, 3) is implemented as a List(1, 2, 3). Lists are strict, so any operation on a list processes all elements, one after the other, before another operation can be performed.

For example, consider this example in the Scala REPL:

$ scala
Welcome to Scala 2.12.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171).
Type in expressions for evaluation. Or try :help.

scala> val xs = List(1, 2, 3)
xs: List[Int] = List(1, 2, 3)

scala> xs.map {x =>
     |   val newX = 2 * x
     |   println(s"Mapping value $x to $newX...")
     |   newX
     | }.foreach {x =>
     |   println(s"Printing value $x")
     | }
Mapping value 1 to 2...
Mapping value 2 to 4...
Mapping value 3 to 6...
Printing value 2
Printing value 4
Printing value 6

Note how each value is mapped, creating a new list (List(2, 4, 6)), before any of the values of that new list are printed out?

A Scala stream (scala.collection.immutable.Stream) is also a subclass of Seq, but it is lazy (or non-strict), meaning that the next value from the stream is only taken when required. It is often referred to as a lazy list.

To illustrate the difference between a Stream and a List, let's redo that example:

scala> val xs = Stream(1, 2, 3)
xs: scala.collection.immutable.Stream[Int] = Stream(1, ?)

scala> xs.map {x =>
     |   val newX = 2 * x
     |   println(s"Mapping value $x to $newX...")
     |   newX
     | }.foreach {x =>
     |   println(s"Printing value $x")
     | }
Mapping value 1 to 2...
Printing value 2
Mapping value 2 to 4...
Printing value 4
Mapping value 3 to 6...
Printing value 6

Note how, for a Stream, we only process the next map operation after all of the operations for the previous element have been completed? The Map operation still returns a new stream (Stream(2, 4, 6)), but values are only taken when needed.

Whether a Stream performs better than a List in any particular situation will depend upon what you're trying to do. If performance is your primary goal, I suggest that you benchmark your code (using a tool such as ScalaMeter) to determine which type works best.

BTW, since both Stream and List are subclasses of Seq, it is common practice to write code that requires a sequence to utilize Seq. That way, you can supply a List or a Stream or any other Seq subclass, without having to change your code, and without having to convert lists, streams, etc. to sequences. For example:

def doSomethingWithSeq[T](seq: Seq[T]) = {
  //
}

// This works!
val list = List(1, 2, 3)
doSomethingWithSeq(list)

// This works too!
val stream = Stream(4, 5, 6)
doSomethingWithSeq(stream)

UPDATED

The performance of List vs. Stream for a groupBy operation is going to be very similar. Depending upon how it's used, a Stream can require less memory than a List, but might require a little extra CPU time. If collection performance is definitely the issue, benchmark both types of collection (see above) and measure precisely to determine the trade-offs between the two. I cannot make that determination for you. It's possible that the slowness you refer to is down to the transmission of data between the database and your application, and has nothing to do with the collection type.

For general information on Scala collection performance, refer to Collections: Performance Charateristics.

UPDATED 2

Also note that any type of Scala sequence will typically be processed sequentially (hence the name), by a single thread at a time. Neither List nor Stream lend themselves to parallel processing of their elements. If you need to process a collection in parallel, you'll need a parallel collection type (one of the collections in scala.collection.parallel). A scala.collection.parallel.ParSeq should process groupBy faster than a List or a Stream, but only if you have multiple cores/hyperthreads available. However, ParSeq operations do not guarantee to preserve the order of the grouped-by elements.

Upvotes: 20

Related Questions