Karel Horak
Karel Horak

Reputation: 1062

Performance of <: Any in Julia

I am new to Julia, and I am trying to understand performance implications of some of the constructs before getting used to bad habits. Currently, I am trying to understand the type system of Julia, especially the <: Any type annotation. As far as I understand, <: Any should stand for I don't care about the type.

Consider the following code

struct Container{T}
    parametric::T
    nonparametric::Int64
end

struct TypeAny
    payload::Container{<: Any}
end

struct TypeKnown
    payload::Container{Array{Int64,1}}
end

getparametric(x) = x.payload.parametric[1]
getnonparametric(x) = x.payload.nonparametric

xany = TypeAny(Container([1], 2))
xknown = TypeKnown(Container([1], 2))

@time for i in 1:10000000 getparametric(xany) end         # 0.212002s
@time for i in 1:10000000 getparametric(xknown) end       # 0.110531s

@time for i in 1:10000000 getnonparametric(xany) end      # 0.173390s
@time for i in 1:10000000 getnonparametric(xknown) end    # 0.086739s

First of all, I was surprised that getparametric(xany) works in the first place when it operates on a field Container{<: Any}.parametric of unknown type. How is that possible and what are the performance implications of such construct? Is Julia doing some kind of runtime reflection behind the scenes to make this possible, or something more sophisticated is going on?

Second, I was surprised by the difference in runtime between the calls getnonparametric(xany) and getnonparametric(xknown) which contradicts my intuition of using type annotation <: Any as an I don't care annotation. Why the call to getnonparametric(xany) is significantly slower, even though I use only a field of known type? And how to ignore type in case I do not want to use any variables of that type without taking a performance hit? (In my use case, it seems not to be possible to specify the concrete type as that would lead to infinitely recursive type definitions - but that could be caused by improper design of my code which is out of the scope of this question.)

Upvotes: 3

Views: 221

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69829

<: Any should stand for I don't care about the type.

It is something like it can be any type (so compiler does not get any hint about the type). You could also write it as:

struct TypeAny
    payload::Container
end

which is essentially the same as you can check using the following test:

julia> Container{<:Any} <: Container
true

julia> Container <: Container{<:Any}
true

How is that possible and what are the performance implications of such construct?

The performance implication is that the concrete type of the object you hold in your container is determined in run-time not in compile time (just as you have suspected).

Note however that if you pass such extracted object to a function then after a dynamic dispatch inside the called function the code will run fast (as it will be type stable). You can read more about it here.

or something more sophisticated is going on?

The more sophisticated thing happens for bits types. If a concrete bits type is a field in a container then it is stored as value. If its type is not known at compile time it will be stored as reference (which has yet additional memory and run time impact).

I was surprised by the difference in runtime between the calls

As commented above, the difference is due to the fact that at compile time the type of the field is not known. If you changed your definition to:

struct TypeAny{T}
    payload::Container{T}
end

then you say I do not care about type, but store it in a parameter, so that compiler knows this type.

Then the type of payload would be known at compile time and all would be fast.

If something I have written above is not clear or you need some more explanations please comment and I will expand the answer.

As a side note - it is usually better to use BenchmarkTools.jl for performance analysis of your code (unless you want to measure compilation time also).

EDIT

Look at:

julia> loop(x) = for i in 1:10000000 getnonparametric(x) end
loop (generic function with 1 method)

julia> @code_native loop(xknown)
        .text
; ┌ @ REPL[14]:1 within `loop'
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %rax
        movq    %rdx, -8(%rbp)
        movl    $74776584, %eax         # imm = 0x4750008
        addq    $8, %rsp
        popq    %rbp
        retq
; └

julia> @code_native loop(xany)
        .text
; ┌ @ REPL[14]:1 within `loop'
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %rax
        movq    %rdx, -8(%rbp)
        movl    $74776584, %eax         # imm = 0x4750008
        addq    $8, %rsp
        popq    %rbp
        retq
; └

And you see that the compiler is smart enough to optimize-out the whole loop (as it is essentially a no-op). This is a power of Julia (but on the other hand - makes benchmarking hard sometimes).

Here is an example that shows you a more accurate view (note that I use a more complex expression, as even very simple expressions in loops can be optimized out by the compiler):

julia> xknowns = fill(xknown, 10^6);

julia> xanys = fill(xany, 10^6);

julia> @btime sum(getnonparametric, $xanys)
  12.373 ms (0 allocations: 0 bytes)
2000000

julia> @btime sum(getnonparametric, $xknowns)
  519.700 μs (0 allocations: 0 bytes)
2000000

Note that even in this case the compiler is "smart enough" to properly infer the return type of the expression in both cases as you access nonparametric field in both cases:

julia> @code_warntype sum(getnonparametric, xanys)
Variables
  #self#::Core.Compiler.Const(sum, false)
  f::Core.Compiler.Const(getnonparametric, false)
  a::Array{TypeAny,1}

Body::Int64
1 ─      nothing
│   %2 = Base.:(#sum#559)(Base.:(:), #self#, f, a)::Int64
└──      return %2

julia> @code_warntype sum(getnonparametric, xknowns)
Variables
  #self#::Core.Compiler.Const(sum, false)
  f::Core.Compiler.Const(getnonparametric, false)
  a::Array{TypeKnown,1}

Body::Int64
1 ─      nothing
│   %2 = Base.:(#sum#559)(Base.:(:), #self#, f, a)::Int64
└──      return %2

The core of the difference can be seen when you look at native code generated in both cases:

julia> @code_native getnonparametric(xany)
        .text
; ┌ @ REPL[6]:1 within `getnonparametric'
        pushq   %rbp
        movq    %rsp, %rbp
; │┌ @ Base.jl:20 within `getproperty'
        subq    $48, %rsp
        movq    (%rcx), %rax
        movq    %rax, -16(%rbp)
        movq    $75966808, -8(%rbp)     # imm = 0x4872958
        movabsq $jl_f_getfield, %rax
        leaq    -16(%rbp), %rdx
        xorl    %ecx, %ecx
        movl    $2, %r8d
        callq   *%rax
; │└
        movq    (%rax), %rax
        addq    $48, %rsp
        popq    %rbp
        retq
        nopl    (%rax,%rax)
; └

julia> @code_native getnonparametric(xknown)
        .text
; ┌ @ REPL[6]:1 within `getnonparametric'
        pushq   %rbp
        movq    %rsp, %rbp
; │┌ @ Base.jl:20 within `getproperty'
        movq    (%rcx), %rax
; │└
        movq    8(%rax), %rax
        popq    %rbp
        retq
        nopl    (%rax)
; └

If you add parameter to the type all is working as expected:

julia> struct Container{T}
           parametric::T
           nonparametric::Int64
       end

julia> struct TypeAny2{T}
           payload::Container{T}
       end

julia> xany2 = TypeAny2(Container([1], 2))
TypeAny2{Array{Int64,1}}(Container{Array{Int64,1}}([1], 2))

julia> @code_native getnonparametric(xany2)
        .text
; ┌ @ REPL[9]:1 within `getnonparametric'
        pushq   %rbp
        movq    %rsp, %rbp
; │┌ @ Base.jl:20 within `getproperty'
        movq    (%rcx), %rax
; │└
        movq    8(%rax), %rax
        popq    %rbp
        retq
        nopl    (%rax)
; └

And you have:

julia> xany2s = fill(xany2, 10^6);

julia> @btime sum(getnonparametric, $xany2s)
  528.699 μs (0 allocations: 0 bytes)
2000000

Summary

  1. Always try to use containers that do not have fields of abstract type if you want performance.
  2. Sometimes if condition in point 1. is not met the compiler can handle it efficiently and generate a fast machine code, but it is not guaranteed in general (so still the recommendation in point 1. applies).

Upvotes: 3

Related Questions