Reputation: 2504
I would like to initialize a 3d tensor (multi-dimensional array) with the values of the "diagonal Gaussian"
exp(-32*(u^2 + 16*(v^2 + w^2)))
where u = 1/sqrt(3)*(x+y+z)
and v,w
are any two coordinates orthogonal to u
, discretised on a uniform mesh on [-1,1]^3
. The following code achieves this:
function gaussian3d(n)
q = qr(ones(3,1), thin=false)[1]
x = linspace(-1.,1., n)
p = Array(Float64,(n,n,n))
square(x) = x*x
Base.@nloops 3 i p begin
@inbounds p[i_1,i_2,i_3] =
exp(
-32*(
square(q[1,1]*x[i_1] + q[2,1]*x[i_2] + q[3,1]*x[i_3])
+ 16*(
square(q[1,2]*x[i_1] + q[2,2]*x[i_2] + q[3,2]*x[i_3]) +
square(q[1,3]*x[i_1] + q[2,3]*x[i_2] + q[3,3]*x[i_3])
)
)
)
end
return p
end
It seems to be quite slow, however. For example, if I replace the defining function with exp(x*y*z)
, the code runs 50x faster. Also, the @time
macro reports ~20% GC time for the above code which I do not understand where they come from. (These numeric values were obtained with n = 128
.) My questions therefore are
Upvotes: 3
Views: 392
Reputation: 11912
Knowing nothing of 3D tensors with values of the "diagonal Gaussian", using thesquare
comment from the original post, "typing" q
(@code_warntype
helps here: Big performance jump!), and further specializing the @nloops
, this works much faster on the platforms I tried it on.
julia> square(x::Float64) = x * x
square (generic function with 1 method)
julia> function my_gaussian3d(n)
q::Array{Float64,2} = qr(ones(3,1), thin=false)[1]
x = linspace(-1.,1., n)
p = Array(Float64,(n,n,n))
Base.@nloops 3 i p d->x_d=x[i_d] begin
@inbounds p[i_1,i_2,i_3] =
exp(
-32*(
square(q[1,1]*x_1 + q[2,1]*x_2 + q[3,1]*x_3)
+ 16*(
square(q[1,2]*x_1 + q[2,2]*x_2 + q[3,2]*x_3) +
square(q[1,3]*x_1 + q[2,3]*x_2 + q[3,3]*x_3)
)
)
)
end
return p
end
my_gaussian3d (generic function with 1 method)
julia> @time gaussian3d(128);
elapsed time: 3.952389337 seconds (1264 MB allocated, 4.50% gc time in 57 pauses with 0 full sweep)
julia> @time gaussian3d(128);
elapsed time: 3.527316699 seconds (1264 MB allocated, 4.42% gc time in 58 pauses with 0 full sweep)
julia> @time my_gaussian3d(128);
elapsed time: 0.285837566 seconds (16 MB allocated)
julia> @time my_gaussian3d(128);
elapsed time: 0.28476448 seconds (16 MB allocated, 1.22% gc time in 0 pauses with 0 full sweep)
julia> my_gaussian3d(128) == gaussian3d(128)
true
Upvotes: 2