Reputation: 4553
When I apply the Scala predefined identity
function to a collection using the map
method the original collection is returned unchanged. However, is the compiler smart enough to simply return the unchanged collection as an O(1)
operation? Or will the identity function still be applied to each element in the original collection yielding a O(n)
operation?
Upvotes: 1
Views: 708
Reputation: 369430
When I apply the Scala predefined
identity
function to a collection using themap
method the original collection is returned unchanged.
No, it isn't. A new collection with identical contents is returned. Constructing this new collection will typically be O(n).
However, is the compiler smart enough to simply return the unchanged collection as an
O(1)
operation? Or will the identity function still be applied to each element in the original collection yielding aO(n)
operation?
In order to perform such an optimization, the compiler will have to decide that the function to be applied is extensionally equal to the identity function. This problem is called the Function Problem and is known to be undecidable. (It can be proven using the Halting Problem, for example.)
Of course, it would be possible to optimize for the specific function Predef.identity
, not just any identity function. But the Scala compiler designers are not fond of such one-off special-case optimizations, that only help standard library code. They prefer general optimizations that benefits all code.
Upvotes: 2
Reputation: 51271
Rex Kerr's handy Thyme utility confirms Alec's findings. The identity
runtime is roughly proportional to the collection size.
val smallC = Vector.tabulate(90)(_*2)
val bigC = Vector.tabulate(900)(_*2)
val th = ichi.bench.Thyme.warmed(verbose = print)
th.pbenchOffWarm("A vs. B")(th.Warm(smallC.map(identity)))(th.Warm(bigC.map(identity)))
Benchmark comparison (in 4.694 s): A vs. B
Significantly different (p ~= 0)
Time ratio: 9.31267 95% CI 9.25599 - 9.36935 (n=20)
First 1.492 us 95% CI 1.487 us - 1.496 us
Second 13.89 us 95% CI 13.82 us - 13.96 us
Upvotes: 2
Reputation: 32309
It is pretty straightforward to check that this is not the case. Start by making a test file with possibly optimized forms and compile it using scalac
(with or without -optimise
)
/// TestMap.scala
object TestMap {
def mapOption[T](o: Option[T]): Option[T] = o.map(identity)
def mapList[T](l: List[T]): List[T] = l.map(identity)
def mapSeq[T](l: Seq[T]): Seq[T] = l.map(identity)
}
Then, checking javap -c TestMap.class
you can see that nothing got optimized past specializing map
to mapSeq
, mapList
, or mapOption
:
Compiled from "TestMap.scala"
public final class TestMap {
public static <T extends java/lang/Object> scala.collection.Seq<T> mapSeq(scala.collection.Seq<T>);
Code:
0: getstatic #16 // Field TestMap$.MODULE$:LTestMap$;
3: aload_0
4: invokevirtual #18 // Method TestMap$.mapSeq:(Lscala/collection/Seq;)Lscala/collection/Seq;
7: areturn
public static <T extends java/lang/Object> scala.collection.immutable.List<T> mapList(scala.collection.immutable.List<T>);
Code:
0: getstatic #16 // Field TestMap$.MODULE$:LTestMap$;
3: aload_0
4: invokevirtual #22 // Method TestMap$.mapList:(Lscala/collection/immutable/List;)Lscala/collection/immutable/List;
7: areturn
public static <T extends java/lang/Object> scala.Option<T> mapOption(scala.Option<T>);
Code:
0: getstatic #16 // Field TestMap$.MODULE$:LTestMap$;
3: aload_0
4: invokevirtual #26 // Method TestMap$.mapOption:(Lscala/Option;)Lscala/Option;
7: areturn
More simply, this sort of optimization just doesn't extend well in languages with side-effects (on the other hand, in Haskell, this sort of thing happens regularly). Should the compiler optimise l.map(x => { println(x); x })
to l
or not for example?
Upvotes: 4
Reputation: 2101
Measuring execution time seems to indicate that identity function is O(n):
Function for measuring code execution time, from this link:
def time[R](block: => R): R = {
val t0 = System.nanoTime()
val result = block // call-by-name
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0) + "ns")
result
}
time {(1 to 100000000).map(identity)} // Elapsed time: 8893077396ns
time {(1 to 10).map(identity)} // Elapsed time: 341129ns
// while only creation of range takes similar order of magnitude times.
time {(1 to 10)} // Elapsed time: 30250ns
time {(1 to 100000000)} // Elapsed time: 32351ns
Upvotes: 0