davidrpugh
davidrpugh

Reputation: 4553

Complexity of mapping identity function to a Scala collection?

When I apply the Scala predefined identity function to a collection using the map method the original collection is returned unchanged. However, is the compiler smart enough to simply return the unchanged collection as an O(1) operation? Or will the identity function still be applied to each element in the original collection yielding a O(n) operation?

Upvotes: 1

Views: 708

Answers (4)

Jörg W Mittag
Jörg W Mittag

Reputation: 369430

When I apply the Scala predefined identity function to a collection using the map method the original collection is returned unchanged.

No, it isn't. A new collection with identical contents is returned. Constructing this new collection will typically be O(n).

However, is the compiler smart enough to simply return the unchanged collection as an O(1) operation? Or will the identity function still be applied to each element in the original collection yielding a O(n) operation?

In order to perform such an optimization, the compiler will have to decide that the function to be applied is extensionally equal to the identity function. This problem is called the Function Problem and is known to be undecidable. (It can be proven using the Halting Problem, for example.)

Of course, it would be possible to optimize for the specific function Predef.identity, not just any identity function. But the Scala compiler designers are not fond of such one-off special-case optimizations, that only help standard library code. They prefer general optimizations that benefits all code.

Upvotes: 2

jwvh
jwvh

Reputation: 51271

Rex Kerr's handy Thyme utility confirms Alec's findings. The identity runtime is roughly proportional to the collection size.

val smallC = Vector.tabulate(90)(_*2)
val bigC = Vector.tabulate(900)(_*2)

val th = ichi.bench.Thyme.warmed(verbose = print)
th.pbenchOffWarm("A vs. B")(th.Warm(smallC.map(identity)))(th.Warm(bigC.map(identity)))

Benchmark comparison (in 4.694 s): A vs. B
Significantly different (p ~= 0)
  Time ratio:    9.31267   95% CI 9.25599 - 9.36935   (n=20)
    First     1.492 us   95% CI 1.487 us - 1.496 us
    Second    13.89 us   95% CI 13.82 us - 13.96 us

Upvotes: 2

Alec
Alec

Reputation: 32309

It is pretty straightforward to check that this is not the case. Start by making a test file with possibly optimized forms and compile it using scalac (with or without -optimise)

/// TestMap.scala
object TestMap {
  def mapOption[T](o: Option[T]): Option[T] = o.map(identity)
  def mapList[T](l: List[T]): List[T] = l.map(identity)
  def mapSeq[T](l: Seq[T]): Seq[T] = l.map(identity)
}

Then, checking javap -c TestMap.class you can see that nothing got optimized past specializing map to mapSeq, mapList, or mapOption:

Compiled from "TestMap.scala"
public final class TestMap {
  public static <T extends java/lang/Object> scala.collection.Seq<T> mapSeq(scala.collection.Seq<T>);
    Code:
       0: getstatic     #16                 // Field TestMap$.MODULE$:LTestMap$;
       3: aload_0       
       4: invokevirtual #18                 // Method TestMap$.mapSeq:(Lscala/collection/Seq;)Lscala/collection/Seq;
       7: areturn       

  public static <T extends java/lang/Object> scala.collection.immutable.List<T> mapList(scala.collection.immutable.List<T>);
    Code:
       0: getstatic     #16                 // Field TestMap$.MODULE$:LTestMap$;
       3: aload_0       
       4: invokevirtual #22                 // Method TestMap$.mapList:(Lscala/collection/immutable/List;)Lscala/collection/immutable/List;
       7: areturn       

  public static <T extends java/lang/Object> scala.Option<T> mapOption(scala.Option<T>);
    Code:
       0: getstatic     #16                 // Field TestMap$.MODULE$:LTestMap$;
       3: aload_0       
       4: invokevirtual #26                 // Method TestMap$.mapOption:(Lscala/Option;)Lscala/Option;
       7: areturn  

More simply, this sort of optimization just doesn't extend well in languages with side-effects (on the other hand, in Haskell, this sort of thing happens regularly). Should the compiler optimise l.map(x => { println(x); x }) to l or not for example?

Upvotes: 4

Samar
Samar

Reputation: 2101

Measuring execution time seems to indicate that identity function is O(n):

Function for measuring code execution time, from this link:

def time[R](block: => R): R = {  
    val t0 = System.nanoTime()
    val result = block    // call-by-name
    val t1 = System.nanoTime()
    println("Elapsed time: " + (t1 - t0) + "ns")
    result
}

time {(1 to 100000000).map(identity)}  // Elapsed time: 8893077396ns

time {(1 to 10).map(identity)} // Elapsed time: 341129ns

// while only creation of range takes similar order of magnitude times. 

time {(1 to 10)}  // Elapsed time: 30250ns

time {(1 to 100000000)} // Elapsed time: 32351ns

Upvotes: 0

Related Questions