krcools
krcools

Reputation: 927

Optimising away residual heap allocation in Julia

I ran julia --track-allocation prof.jl resulting in the following output:

    - using FixedSizeArrays
    - 
    - immutable KernelVals{T}
    -     wavenumber::T
    -     vect::Vec{3,T}
    -     dist::T
    -     green::Complex{T}
    -     gradgreen::Vec{3,Complex{T}}
    - end
    - 
    - function kernelvals(k, x, y)
    -     r = x - y
    0     R2 =  r[1]*r[1]
    0     R2 += r[2]*r[2]
    0     R2 += r[3]*r[3]
    0     R = sqrt(R2)
    - 
    0     γ = im*k
    0     expn = exp(-γ * R)
    0     fctr = 1.0 / (4.0*pi*R)
    0     green = fctr * expn
   64     gradgreen = -(γ + 1/R) * green / R * r
    - 
    0     KernelVals(k, r, R, green, gradgreen)
    - end
    - 
    - function payload()
    -   x = Vec{3,Float64}(0.47046262275611883,0.8745228524771103,-0.049820876498487966)
    0   y = Vec{3,Float64}(-0.08977259509004082,0.543199687600189,0.8291184043296924)
    0   k = 1.0
    0   kv = kernelvals(k,x,y)
    -   return kv
    - end
    - 
    - function driver()
    -   println("Flush result: ", payload())
    0   Profile.clear_malloc_data()
    0   payload()
    - end
    - 
    - driver()

I cannot get rid of the final memory allocation on the line starting gradgreen.... I ran @code_warntype kernelsvals(...), revealing no type instability or uncertainty.

The allocation pattern is identical on julia-0.4.6 and julia-0.5.0-pre.

This function will be the inner kernel in a boundary element method I am implementing. It will be called literally millions of times, resulting in a gross memory allocation that can grow to be a multiple of the physical memory available to me.

The reason I am using FixedSizeArrays is to avoid allocations related to the creation of small Arrays.

The precise location where the allocation is reported depends in a very sensitive manner on the code. At some point the memory profiler was blaming 1/(4*pi*R) as the line triggering allocation.

Any help or general tips on how to write code resulting in predictable allocation patterns is highly appreciated.

Upvotes: 6

Views: 238

Answers (1)

krcools
krcools

Reputation: 927

After some experiments I finally managed to get rid of all allocations. The culprit turned out to be the promotion architecture as extended in FixedSizeArrays. Apparently multiplying a complex scalar and a real vector creates a temporary along the way.

Replacing the definition of gradgreen with

c = -(γ + 1/R) * green / R
gradgreen = Vec(c*r[1], c*r[2], c*r[3])

results in allocation-free runs. In my benchmark example execution time came down from 6.5 seconds to 4.15 seconds. Total allocation size from 4.5 GB to 1.4 GB.

EDT: Reported this issue to FixedSizeArrays developers, who fixed it immediately (thank you!). Allocations disappeared completely.

Upvotes: 3

Related Questions