Muhammad Hewedy
Muhammad Hewedy

Reputation: 30058

Does Scala has intermediate/terminal ops as Java8 has?

In Java8

When I write such code:

Stream<Integer> xs = Arrays.asList(1, 3, 5, 6, 7, 10).stream();
xs.map(x -> x * x).filter (x -> x > 15).forEach(System.out::println);

Java8 streams are divided into two sections; intermediate vs terminal operations, where the -AFAIK - actual action (under-the-hood iterations) is done in the terminal ops, while each intermediate ops appends its own -let me name it- Apply inner classes.

By this, there will be only one iteration over the list.

Sample code from JDK8:

@Override
@SuppressWarnings("unchecked")
public final <R> Stream<R> map(Function<? super P_OUT, ? extends R> mapper) {
    Objects.requireNonNull(mapper);
    return new StatelessOp<P_OUT, R>(this, StreamShape.REFERENCE,
                                 StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT) {
        @Override
        Sink<P_OUT> opWrapSink(int flags, Sink<R> sink) {
            return new Sink.ChainedReference<P_OUT, R>(sink) {
                @Override
                public void accept(P_OUT u) {
                    downstream.accept(mapper.apply(u));
                }
            };
        }
    };
}

In Scala

When I write such code:

val xs = List(1, 3, 5, 6, 7, 10) 
xs map (x => x * x) filter (x => x > 15) foreach (println)

I have been a while reading about it, but I never hear about such terms explicitly, moreover, the SDK implementation loops (either using recursion or regular loops) on each operation:

final override def map[B, That](f: A => B)(implicit bf: CanBuildFrom[List[A], B, That]): That = {
if (bf eq List.ReusableCBF) {
  if (this eq Nil) Nil.asInstanceOf[That] else {
    val h = new ::[B](f(head), Nil)
    var t: ::[B] = h
    var rest = tail
    while (rest ne Nil) {
      val nx = new ::(f(rest.head), Nil)
      t.tl = nx
      t = nx
      rest = rest.tail
    }
    h.asInstanceOf[That]
  }
}
else super.map(f)
}

My questions is:

Can we consider that Java implementation on the same thing would be much faster. (O(n) in Java vs O(multiples of n) in Scala)

Upvotes: 12

Views: 392

Answers (2)

Rex Kerr
Rex Kerr

Reputation: 167891

Java 8 streams are less-feature-complete cousins of Scala iterators save for their ability to compute in parallel.

If you don't need parallel computation (and most times the overhead is not worth it--only for big expensive jobs do you want it), then you can get the same sort of processing with .iterator in Scala (and then to[Vector] or whatever you want at the end).

Java 8 streams are manually specialized (Scala Iterator is not), so there are use cases where they are way faster, but it's not because of the constant factor due to recreating collections along the way--at least, not if you throw a .iterator in there. (Without .iterator, Scala collections evaluate eagerly by default; Java collections do not have that option.)

The Scala equivalent for the Java 8 code you wrote is the following:

val xsi = Array(1, 3, 5, 6, 7, 10).iterator
xsi.map(x => x*x).filter(_ > 15).foreach(println)

There is no difference in the number of collections created here with Scala vs. Java.

It would probably be a good idea to adopt the very clear "terminal operation" language for Scala's Iterator documentation. The Java 8 stream docs are great in that they make it exquisitely clear when you're building up the description of the work and when you're finally doing it.

Scala also provides a Stream class that memoizes old work (so you don't have to compute it a second time if you're reusing it), and various views so you don't have to recreate the processing chain each time you want to use it. For example, with your squaring, you could

val xsv = Array(1, 3, 5, 6, 7, 10).view
val xsq = xsv.map(x => x*x)
xsq.filter(_ > 15).foreach(println)
xsq.filter(_ < 5).foreach(println)

whereas with Java 8 streams xsq would be exhausted after the first terminal operation.

So Scala actually does everything (save parallelism) that Java 8 streams do, and quite a bit more besides, and has for a long time.

Scala also has parallelizing collections, but the Java 8 implementation is sufficiently superior in performance that at this point I'd recommend using them first. And again, if manual specialization is your thing, Java 8 streams have it for Int, Double, and Long, and that's a big performance win. (Note: your example, using asList, is not manually specialized.)

But if you just want to queue up operations and not have the overhead of building intermediate collections, Scala does it. You just have to ask.

Upvotes: 12

marstran
marstran

Reputation: 28036

The list in Scala is eager, which means that (as you say) there are multiple iterations over the list. There are (at least) two ways to get around this.

Using a view:

xs.view.map(x => x * x).filter(x => x > 15).force

Or by converting the list to a stream (which is lazy):

xs.toStream.map(x => x * x).filter(x => x > 15)

Upvotes: 9

Related Questions