Reputation: 1117
In the NFFT package there are fast, specialized functions for vectors and matrices and slow functions for general arrays. I would like to make the general function fast and the specialized functions obsolete, but I'm running into trouble.
One family of specialized functions look essentially like this (offset
is computed in the actual code, but its value is not important for this question):
myfunc!(f::Vector, g::Vector)
offset = 5
n = length(g)
N = length(f)
for l = 1:N
g[ ((l+offset)%n)+1 ] = f[l]
end
end
and
myfunc!(f::Matrix, g::Matrix)
offsetx = 5
offsety = 5
n1, n2 = size(g)
N1, N2 = size(f)
for ly = 1:N2
for lx = 1:N1
g[ ((lx+offsetx)%n1)+1, ((ly+offsety)%n2)+1 ] = f[lx, ly]
end
end
end
The general function could be written like this
myfunc!{D}(f::Array{D}, g::Array{D})
offset = ntuple(d -> 5, D)
n = size(g)
for l in CartesianRange(size(f))
idx = CartesianIndex{D}( ntuple(d -> ((l[d]+offset[d])%n[d])+1, D) )
g[ idx ] = f[l]
end
end
Unfortunately, this is terribly slow. The majority of the time is spent in the ntuple
in the loop.
Other possibilities for idx
include letting idx = Array{Int}(D)
and make the inner loop look like
for d = 1:D
idx[d] = ((l[d]+offset[d])%n[d])+1
end
g[idx...] = f[l]
This is also slow.
I'm thinking that since the dimension D
is a type argument, a @generated
function can be made for computing idx
, but I cannot figure out how to do that (or if there is a better way).
I'm using Julia v0.4.5.
Upvotes: 2
Views: 101
Reputation: 28222
The answer for how to do this with a generated function is to use the Base.Cartesian helper macros.
using Base.Cartesian
@generated function myfunc!{T,D}(f::Array{T,D}, g::Array{T,D})
quote
@nexprs $D d->offset_d=5 #Declase offset_1=5, offset_2=5 etc
@nloops $D l f begin
(@nref $D g d->(l_d+offset_d)%size(g,d)+1) = @nref $D f l
end
end
end
I have confirmed this is correct, for 2D at least. I leave it as an excise to the reader to profile it. It more or less does not allocate.
Upvotes: 1