robertdj
robertdj

Reputation: 1117

In Julia NFFT, replace vector and matrix methods with a single array method

In the NFFT package there are fast, specialized functions for vectors and matrices and slow functions for general arrays. I would like to make the general function fast and the specialized functions obsolete, but I'm running into trouble.

One family of specialized functions look essentially like this (offset is computed in the actual code, but its value is not important for this question):

myfunc!(f::Vector, g::Vector)
    offset = 5
    n = length(g)
    N = length(f)

    for l = 1:N
        g[ ((l+offset)%n)+1 ] = f[l]
    end
end

and

myfunc!(f::Matrix, g::Matrix)
    offsetx = 5
    offsety = 5
    n1, n2 = size(g)
    N1, N2 = size(f)

    for ly = 1:N2
        for lx = 1:N1
            g[ ((lx+offsetx)%n1)+1, ((ly+offsety)%n2)+1 ] = f[lx, ly]
        end
    end
end

The general function could be written like this

myfunc!{D}(f::Array{D}, g::Array{D})
    offset = ntuple(d -> 5, D)
    n = size(g)

    for l in CartesianRange(size(f))
        idx = CartesianIndex{D}( ntuple(d -> ((l[d]+offset[d])%n[d])+1, D) )
        g[ idx ] = f[l]
    end
end

Unfortunately, this is terribly slow. The majority of the time is spent in the ntuple in the loop.

Other possibilities for idx include letting idx = Array{Int}(D) and make the inner loop look like

for d = 1:D
    idx[d] = ((l[d]+offset[d])%n[d])+1
end
g[idx...] = f[l]

This is also slow.

I'm thinking that since the dimension D is a type argument, a @generated function can be made for computing idx, but I cannot figure out how to do that (or if there is a better way).

I'm using Julia v0.4.5.

Upvotes: 2

Views: 101

Answers (1)

Frames Catherine White
Frames Catherine White

Reputation: 28222

The answer for how to do this with a generated function is to use the Base.Cartesian helper macros.

using Base.Cartesian 

@generated function myfunc!{T,D}(f::Array{T,D}, g::Array{T,D})
    quote
        @nexprs $D d->offset_d=5 #Declase offset_1=5, offset_2=5 etc

        @nloops $D l f begin
            (@nref $D g d->(l_d+offset_d)%size(g,d)+1) = @nref $D f l 
        end
    end
end

I have confirmed this is correct, for 2D at least. I leave it as an excise to the reader to profile it. It more or less does not allocate.

Upvotes: 1

Related Questions