Reputation: 927
I ran julia --track-allocation prof.jl
resulting in the following output:
- using FixedSizeArrays
-
- immutable KernelVals{T}
- wavenumber::T
- vect::Vec{3,T}
- dist::T
- green::Complex{T}
- gradgreen::Vec{3,Complex{T}}
- end
-
- function kernelvals(k, x, y)
- r = x - y
0 R2 = r[1]*r[1]
0 R2 += r[2]*r[2]
0 R2 += r[3]*r[3]
0 R = sqrt(R2)
-
0 γ = im*k
0 expn = exp(-γ * R)
0 fctr = 1.0 / (4.0*pi*R)
0 green = fctr * expn
64 gradgreen = -(γ + 1/R) * green / R * r
-
0 KernelVals(k, r, R, green, gradgreen)
- end
-
- function payload()
- x = Vec{3,Float64}(0.47046262275611883,0.8745228524771103,-0.049820876498487966)
0 y = Vec{3,Float64}(-0.08977259509004082,0.543199687600189,0.8291184043296924)
0 k = 1.0
0 kv = kernelvals(k,x,y)
- return kv
- end
-
- function driver()
- println("Flush result: ", payload())
0 Profile.clear_malloc_data()
0 payload()
- end
-
- driver()
I cannot get rid of the final memory allocation on the line starting gradgreen...
. I ran @code_warntype kernelsvals(...)
, revealing no type instability or uncertainty.
The allocation pattern is identical on julia-0.4.6
and julia-0.5.0-pre
.
This function will be the inner kernel in a boundary element method I am implementing. It will be called literally millions of times, resulting in a gross memory allocation that can grow to be a multiple of the physical memory available to me.
The reason I am using FixedSizeArrays
is to avoid allocations related to the creation of small Array
s.
The precise location where the allocation is reported depends in a very sensitive manner on the code. At some point the memory profiler was blaming 1/(4*pi*R)
as the line triggering allocation.
Any help or general tips on how to write code resulting in predictable allocation patterns is highly appreciated.
Upvotes: 6
Views: 238
Reputation: 927
After some experiments I finally managed to get rid of all allocations. The culprit turned out to be the promotion architecture as extended in FixedSizeArrays
. Apparently multiplying a complex scalar and a real vector creates a temporary along the way.
Replacing the definition of gradgreen
with
c = -(γ + 1/R) * green / R
gradgreen = Vec(c*r[1], c*r[2], c*r[3])
results in allocation-free runs. In my benchmark example execution time came down from 6.5 seconds to 4.15 seconds. Total allocation size from 4.5 GB to 1.4 GB.
EDT: Reported this issue to FixedSizeArrays
developers, who fixed it immediately (thank you!). Allocations disappeared completely.
Upvotes: 3