Reputation: 63
I was wondering whether the expression
dx*collect(0:J)
where J
is Int64
and dx
is Float64
is computationally more efficient than collect(0:dx:dx*J)
or the other way around?
To phrase it differently: Is it more efficient to generate an integer array and multiply (J+1) times by a float (1st case) or generate a float array in the first place (2nd case)?
What are the benefits/downsides to each case?
Upvotes: 4
Views: 290
Reputation: 20248
Please be aware that both solutions do not give the exact same results:
julia> dx = 0.1;
julia> J = 10;
julia> a = dx*collect(0:J)
11-element Array{Float64,1}:
0.0
0.1
0.2
0.30000000000000004
0.4
0.5
0.6000000000000001
0.7000000000000001
0.8
0.9
1.0
julia> b = collect(0:dx:J*dx)
11-element Array{Float64,1}:
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
julia> maximum(abs.(b-a))
1.1102230246251565e-16
This is because of tricky floating-point issues. As transpires in the type of floating-point ranges like 0:dx:J*dx
(see below), the implementation of such ranges goes to great length trying to avoid round-off errors, by doubling the precision of floats for some intermediate results, and its results are slightly different from the simple interger-based implementation.
julia> typeof(0:dx:J*dx)
StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}
Also note that a comprehension might get you the same results as your first implementation, without any intermediate allocation:
julia> c = [dx*i for i in 0:J]
11-element Array{Float64,1}:
0.0
0.1
0.2
0.30000000000000004
0.4
0.5
0.6000000000000001
0.7000000000000001
0.8
0.9
1.0
julia> maximum(abs.(c-a))
0.0
Benchmarks seem to indicate that, performancewise and neglecting possible round-off issues, using a comprehension is faster in most cases:
julia> using BenchmarkTools
julia> collect_first(dx, J) = dx*collect(0:J);
julia> collect_all( dx, J) = collect(0:dx:dx*J);
julia> comprehension(dx, J) = [dx*i for i in 0:J];
# Small size
julia> @btime collect_first(0.1, 100);
206.895 ns (2 allocations: 1.75 KiB)
julia> @btime collect_all(0.1, 100);
476.533 ns (1 allocation: 896 bytes)
julia> @btime comprehension(0.1, 100);
134.363 ns (1 allocation: 896 bytes)
# Large size
julia> @btime collect_first(0.1, 1_000_000);
1.970 ms (4 allocations: 15.26 MiB)
julia> @btime collect_all(0.1, 1_000_000);
2.557 ms (2 allocations: 7.63 MiB)
julia> @btime comprehension(0.1, 1_000_000);
900.449 μs (2 allocations: 7.63 MiB)
Upvotes: 2
Reputation: 13800
I'm sure there'll be some people around who can give you a full explanation in terms of what code is being generated by LLVM etc., but in simple cases like this when in doubt you can just benchmark:
julia> using BenchmarkTools
julia> collect_first(dx, J) = dx*collect(0:J)
collect_first (generic function with 1 method)
julia> collect_all(dx, J) = collect(0:dx:dx*J)
collect_all (generic function with 1 method)
julia> @btime collect_first(3.2, 100);
172.194 ns (2 allocations: 1.75 KiB)
julia> @btime collect_all(3.2, 100);
359.330 ns (1 allocation: 896 bytes)
julia> @btime collect_first(3.2, 10_000);
11.300 μs (4 allocations: 156.53 KiB)
julia> @btime collect_all(3.2, 10_000);
18.601 μs (2 allocations: 78.27 KiB)
julia> @btime collect_first(3.2, 100_000);
145.499 μs (4 allocations: 1.53 MiB)
julia> @btime collect_all(3.2, 100_000);
183.300 μs (2 allocations: 781.39 KiB)
julia> @btime collect_first(3.2, 1_000_000);
5.650 ms (4 allocations: 15.26 MiB)
julia> @btime collect_all(3.2, 1_000_000);
3.806 ms (2 allocations: 7.63 MiB)
So if you collect first, you double the number of allocations (presumably as allocation happens upon collect
and then again when multiplying to get the output), however for small arrays it seems that this is still faster in terms of computation time. For large arrays collecting after multiplying the range strictly dominates.
Upvotes: 3