Reputation: 21

Exponent calculation speed

I am currently testing Julia (I've worked with Matlab)

In matlab the calculation speed of N^3 is slower than NxNxN. This doesn't happen with N^2 and NxN. They use a different algorithm to calculate higher-order exponents because they prefer accuracy rather than speed.

I think Julia do the same thing.

I wanted to ask if there is a way to force Julia to calculate the exponent of N using multiplication instead of the default algorithm, at least for cube exponents.

Some time ago a I did a few test on matlab of this. I made a translation of that code to julia.

Links to code: http://pastebin.com/bbeukhTc (I cant upload all the links here :( )

Results of the scripts on Matlab 2014:

Exponente1

Elapsed time is 68.293793 seconds. (17.7x times of the smallest)

Exponente2

Elapsed time is 24.236218 seconds. (6.3x times of the smallests)

Exponente3

Elapsed time is 3.853348 seconds.

Results of the scripts on Julia 0.46:

Exponente1

18.423204 seconds (8.22 k allocations: 372.563 KB) (51.6x times of the smallest)

Exponente2

13.746904 seconds (9.02 k allocations: 407.332 KB) (38.5 times of the smallest)

Exponente3

0.356875 seconds (10.01 k allocations: 450.441 KB)

In my tests julia is faster than Matlab, but i am using a relative old version. I cant test other versions.

Upvotes: 2

Answers (2)

Lior Kogan

Reputation: 20648

Checking Julia's source code:

julia/base/math.jl:

^(x::Float64, y::Integer) =
    box(Float64, powi_llvm(unbox(Float64,x), unbox(Int32,Int32(y))))
^(x::Float32, y::Integer) =
    box(Float32, powi_llvm(unbox(Float32,x), unbox(Int32,Int32(y))))

julia/base/fastmath.jl:

pow_fast{T<:FloatTypes}(x::T, y::Integer) = pow_fast(x, Int32(y))
pow_fast{T<:FloatTypes}(x::T, y::Int32) =
    box(T, Base.powi_llvm(unbox(T,x), unbox(Int32,y)))

We can see that Julia uses powi_llvm

Checking llvm's source code:

define double @powi(double %F, i32 %power) {
; CHECK: powi:
; CHECK: bl __powidf2
        %result = call double @llvm.powi.f64(double %F, i32 %power)
    ret double %result
}

Now, the __powidf2 is the interesting function here:

COMPILER_RT_ABI double
__powidf2(double a, si_int b)
{
    const int recip = b < 0;
    double r = 1;
    while (1)
    {
        if (b & 1)
            r *= a;
        b /= 2;
        if (b == 0)
            break;
        a *= a;
    }
    return recip ? 1/r : r;
}

Example 1: given a = 2; b = 7:

 -              r          =   1
 - iteration 1: r = 1 *  2 =   2; b = (int)(7/2) = 3; a = 2 * 2 =  4
 - iteration 2: r = 2 *  4 =   8; b = (int)(3/2) = 1; a = 4 * 4 = 16
 - iteration 3: r = 8 * 16 = 128;

Example 2: given a = 2; b = 8:

 -              r           =   1
 - iteration 1: r           =   1; b = (int)(8/2) = 4; a =  2 *  2 =   4
 - iteration 2: r           =   1; b = (int)(4/2) = 2; a =  4 *  4 =  16
 - iteration 3: r           =   1; b = (int)(2/2) = 1; a = 16 * 16 = 256
 - iteration 4: r = 1 * 256 = 256; b = (int)(1/2) = 0;

Integer power is always implemented as a sequence multiplications. That's why N^3 is slower than N^2.

jl_powi_llvm (called in fastmath.jl. "jl_" is concatenated by macro expansion), on the other hand, casts the exponent to floating-point and calls pow(). C source code:

JL_DLLEXPORT jl_value_t *jl_powi_llvm(jl_value_t *a, jl_value_t *b)
{
    jl_value_t *ty = jl_typeof(a);
    if (!jl_is_bitstype(ty))
        jl_error("powi_llvm: a is not a bitstype");
    if (!jl_is_bitstype(jl_typeof(b)) || jl_datatype_size(jl_typeof(b)) != 4)
        jl_error("powi_llvm: b is not a 32-bit bitstype");
    jl_value_t *newv = newstruct((jl_datatype_t*)ty);
    void *pa = jl_data_ptr(a), *pr = jl_data_ptr(newv);
    int sz = jl_datatype_size(ty);
    switch (sz) {
    /* choose the right size c-type operation */
    case 4:
        *(float*)pr = powf(*(float*)pa, (float)jl_unbox_int32(b));
        break;
    case 8:
        *(double*)pr = pow(*(double*)pa, (double)jl_unbox_int32(b));
        break;
    default:
        jl_error("powi_llvm: runtime floating point intrinsics are not implemented for bit sizes other than 32 and 64");
    }
    return newv;
}

Upvotes: 5

Fengyang Wang

Reputation: 12051

Lior's answer is excellent. Here is a solution to the problem you posed: Yes, there is a way to force usage of multiplication, at cost of accuracy. It's the @fastmath macro:

julia> @benchmark 1.1 ^ 3
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     999
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  16.00 bytes
  allocs estimate:  1
  minimum time:     13.00 ns (0.00% GC)
  median time:      14.00 ns (0.00% GC)
  mean time:        15.74 ns (6.14% GC)
  maximum time:     1.85 μs (98.16% GC)

julia> @benchmark @fastmath 1.1 ^ 3
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     2.00 ns (0.00% GC)
  median time:      3.00 ns (0.00% GC)
  mean time:        2.59 ns (0.00% GC)
  maximum time:     20.00 ns (0.00% GC)

Note that with @fastmath, performance is much better.

Upvotes: 2

Exponent calculation speed

Answers (2)

Related Questions