Reputation: 105
I have a scenario where i get DB data in the form of Stream of Objects. and while transforming it into a sequence of Object it is taking time. I am looking for alternative which takes less time.
Upvotes: 5
Views: 6090
Reputation: 8299
Quick answer: a Scala stream is already a Scala sequence and does not need to be converted at all. Further explanation below...
A Scala sequence (scala.collection.Seq
) is simply any collection that stores a sequence of elements in a specific order (the ordering is arbitrary, but element order doesn't change once defined).
A Scala list (scala.collection.immutable.List
) is a subclass of Seq
and is also the default implementation of a scala.collection.Seq
. That is, Seq(1, 2, 3)
is implemented as a List(1, 2, 3)
. List
s are strict, so any operation on a list processes all elements, one after the other, before another operation can be performed.
For example, consider this example in the Scala REPL:
$ scala
Welcome to Scala 2.12.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171).
Type in expressions for evaluation. Or try :help.
scala> val xs = List(1, 2, 3)
xs: List[Int] = List(1, 2, 3)
scala> xs.map {x =>
| val newX = 2 * x
| println(s"Mapping value $x to $newX...")
| newX
| }.foreach {x =>
| println(s"Printing value $x")
| }
Mapping value 1 to 2...
Mapping value 2 to 4...
Mapping value 3 to 6...
Printing value 2
Printing value 4
Printing value 6
Note how each value is mapped, creating a new list (List(2, 4, 6)
), before any of the values of that new list are printed out?
A Scala stream (scala.collection.immutable.Stream
) is also a subclass of Seq
, but it is lazy (or non-strict), meaning that the next value from the stream is only taken when required. It is often referred to as a lazy list.
To illustrate the difference between a Stream
and a List
, let's redo that example:
scala> val xs = Stream(1, 2, 3)
xs: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> xs.map {x =>
| val newX = 2 * x
| println(s"Mapping value $x to $newX...")
| newX
| }.foreach {x =>
| println(s"Printing value $x")
| }
Mapping value 1 to 2...
Printing value 2
Mapping value 2 to 4...
Printing value 4
Mapping value 3 to 6...
Printing value 6
Note how, for a Stream
, we only process the next map
operation after all of the operations for the previous element have been completed? The Map
operation still returns a new stream (Stream(2, 4, 6)
), but values are only taken when needed.
Whether a Stream
performs better than a List
in any particular situation will depend upon what you're trying to do. If performance is your primary goal, I suggest that you benchmark your code (using a tool such as ScalaMeter) to determine which type works best.
BTW, since both Stream
and List
are subclasses of Seq
, it is common practice to write code that requires a sequence to utilize Seq
. That way, you can supply a List
or a Stream
or any other Seq
subclass, without having to change your code, and without having to convert lists, streams, etc. to sequences. For example:
def doSomethingWithSeq[T](seq: Seq[T]) = {
//
}
// This works!
val list = List(1, 2, 3)
doSomethingWithSeq(list)
// This works too!
val stream = Stream(4, 5, 6)
doSomethingWithSeq(stream)
UPDATED
The performance of List
vs. Stream
for a groupBy
operation is going to be very similar. Depending upon how it's used, a Stream
can require less memory than a List
, but might require a little extra CPU time. If collection performance is definitely the issue, benchmark both types of collection (see above) and measure precisely to determine the trade-offs between the two. I cannot make that determination for you. It's possible that the slowness you refer to is down to the transmission of data between the database and your application, and has nothing to do with the collection type.
For general information on Scala collection performance, refer to Collections: Performance Charateristics.
UPDATED 2
Also note that any type of Scala sequence will typically be processed sequentially (hence the name), by a single thread at a time. Neither List
nor Stream
lend themselves to parallel processing of their elements. If you need to process a collection in parallel, you'll need a parallel collection type (one of the collections in scala.collection.parallel
). A scala.collection.parallel.ParSeq
should process groupBy
faster than a List
or a Stream
, but only if you have multiple cores/hyperthreads available. However, ParSeq
operations do not guarantee to preserve the order of the grouped-by elements.
Upvotes: 20