Jakob Elias
Jakob Elias

Reputation: 63

Does multiplying an integer array by a Float increase or decrease performance compared to generating a Float array in Julia?

I was wondering whether the expression dx*collect(0:J) where J is Int64 and dx is Float64is computationally more efficient than collect(0:dx:dx*J) or the other way around?

To phrase it differently: Is it more efficient to generate an integer array and multiply (J+1) times by a float (1st case) or generate a float array in the first place (2nd case)?

What are the benefits/downsides to each case?

Upvotes: 4

Views: 290

Answers (2)

Please be aware that both solutions do not give the exact same results:

julia> dx = 0.1;
julia> J = 10;

julia> a = dx*collect(0:J)
11-element Array{Float64,1}:
 0.0                
 0.1                
 0.2                
 0.30000000000000004
 0.4                
 0.5                
 0.6000000000000001 
 0.7000000000000001 
 0.8                
 0.9                
 1.0                

julia> b = collect(0:dx:J*dx)
11-element Array{Float64,1}:
 0.0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.7
 0.8
 0.9
 1.0

julia> maximum(abs.(b-a))
1.1102230246251565e-16

This is because of tricky floating-point issues. As transpires in the type of floating-point ranges like 0:dx:J*dx (see below), the implementation of such ranges goes to great length trying to avoid round-off errors, by doubling the precision of floats for some intermediate results, and its results are slightly different from the simple interger-based implementation.

julia> typeof(0:dx:J*dx)
StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}


Also note that a comprehension might get you the same results as your first implementation, without any intermediate allocation:

julia> c = [dx*i for i in 0:J]
11-element Array{Float64,1}:
 0.0                
 0.1                
 0.2                
 0.30000000000000004
 0.4                
 0.5                
 0.6000000000000001 
 0.7000000000000001 
 0.8                
 0.9                
 1.0                

julia> maximum(abs.(c-a))
0.0


Benchmarks seem to indicate that, performancewise and neglecting possible round-off issues, using a comprehension is faster in most cases:

julia> using BenchmarkTools
julia> collect_first(dx, J) = dx*collect(0:J);
julia> collect_all(  dx, J) = collect(0:dx:dx*J);
julia> comprehension(dx, J) = [dx*i for i in 0:J];

# Small size
julia> @btime collect_first(0.1, 100);
  206.895 ns (2 allocations: 1.75 KiB)

julia> @btime collect_all(0.1, 100);
  476.533 ns (1 allocation: 896 bytes)

julia> @btime comprehension(0.1, 100);
  134.363 ns (1 allocation: 896 bytes)

# Large size
julia> @btime collect_first(0.1, 1_000_000);
  1.970 ms (4 allocations: 15.26 MiB)

julia> @btime collect_all(0.1, 1_000_000);
  2.557 ms (2 allocations: 7.63 MiB)

julia> @btime comprehension(0.1, 1_000_000);
  900.449 μs (2 allocations: 7.63 MiB)

Upvotes: 2

Nils Gudat
Nils Gudat

Reputation: 13800

I'm sure there'll be some people around who can give you a full explanation in terms of what code is being generated by LLVM etc., but in simple cases like this when in doubt you can just benchmark:

julia> using BenchmarkTools

julia> collect_first(dx, J) = dx*collect(0:J)
collect_first (generic function with 1 method)

julia> collect_all(dx, J) = collect(0:dx:dx*J)
collect_all (generic function with 1 method)

julia> @btime collect_first(3.2, 100);
  172.194 ns (2 allocations: 1.75 KiB)

julia> @btime collect_all(3.2, 100);
  359.330 ns (1 allocation: 896 bytes)

julia> @btime collect_first(3.2, 10_000);
  11.300 μs (4 allocations: 156.53 KiB)

julia> @btime collect_all(3.2, 10_000);
  18.601 μs (2 allocations: 78.27 KiB)

julia> @btime collect_first(3.2, 100_000);
  145.499 μs (4 allocations: 1.53 MiB)

julia> @btime collect_all(3.2, 100_000);
  183.300 μs (2 allocations: 781.39 KiB)

julia> @btime collect_first(3.2, 1_000_000);
  5.650 ms (4 allocations: 15.26 MiB)

julia> @btime collect_all(3.2, 1_000_000);
  3.806 ms (2 allocations: 7.63 MiB)

So if you collect first, you double the number of allocations (presumably as allocation happens upon collect and then again when multiplying to get the output), however for small arrays it seems that this is still faster in terms of computation time. For large arrays collecting after multiplying the range strictly dominates.

Upvotes: 3

Related Questions