Harry McLary
Harry McLary

Reputation: 21

Basic operations combining two SharedArrays

I've spent the last month or so learning julia and I'm very impressed. In particular I'm analysing large amount of climate model output, I put all this into SharedArrays and adjust and plot it all in parallel. So far it's very quick and efficient and I've got quite a library of code. My current problem is in creating a function that can do basic operations on two shared arrays. I've successfully written a function that takes two arrays and how you want to process them. The code is based around the example in the parallel section of the julia doc and uses the myrange function as shown there

function myrange(q::SharedArray)
    idx = indexpids(q)
    #@show (idx)
    if idx == 0
        # This worker is not assigned a piece
        return 1:0, 1:0
        print("NO WORKERS ASSIGNED")
    end
    nchunks = length(procs(q))
    splits = [round(Int, s) for s in linspace(0,length(q),nchunks+1)]
    splits[idx]+1:splits[idx+1]
end

function combine_arrays_chunk!(array_1,array_2,output_array,func, length_range);
    #@show (length_range)
    for i in length_range
        output_array[i] = func(array_1[i], array_2[i]);
        #hardwired example for func = +
        #output_array[i] = +(array_1[i], array_2[i]);
    end
    output_array
end

combine_arrays_shared_chunk!(array_1,array_2,output_array,func) = combine_arrays_chunk!(array_1,array_2,output_array,func, myrange(array_1));

function combine_arrays_shared(array_1::SharedArray,array_2::SharedArray,func)
    if size(array_1)!=size(array_2)
        return print("inputs not of the same size")
    end
    output_array=SharedArray(Float64,size(array_1));
    @sync begin
        for p in procs(array_1)
            @async remotecall_wait(p, combine_arrays_shared_chunk!, array_1,array_2,output_array,func)
        end
    end
    output_array
end

The works so one can do

strain_div  = combine_arrays_shared(eps_1,eps_2,+);
strain_tot  = combine_arrays_shared(eps_1,eps_2,hypot);

with the correct results an the output as a shared array as required. But ... it's quite slow. It's actually quicker to combine the sharedarray as a normal array on one processor, calculate and then convert back to a sharedarray (for my test cases anyway, with each array approx 200MB, when I move up to GBs I guess not). I can hardwire the combine_arrays_shared function to only do addition (or some other function), and then you get the speed increase, but with function type being passed within combine_arrays_shared the whole thing is slow (10 times slower than the hard wired addition).

I've looked at the FastAnonymous.jl package but I can't see how it would work in this case. I tried, and failed. Any ideas?

I might just resort to writing a different combine_arrays_... function for each basic function I use, or having the func argument as a option and call different functions from within combine_arrays_shared, but I want it to be more elegant! Also this is good way to learn more about Julia.

Harry

Upvotes: 2

Views: 96

Answers (1)

tholy
tholy

Reputation: 12179

This question actually has nothing to do with SharedArrays, and is just "how do I pass functions-as-arguments and get better performance?"

The way FastAnonymous works---and similar to the way closures will work in julia soon---is to create a type with a call method. If you're having trouble with FastAnonymous for some reason, you can always do it manually:

julia> immutable Foo end

julia> Base.call(f::Foo, x, y) = x*y
call (generic function with 1036 methods)

julia> function applyf(f, X)
           s = zero(eltype(X))
           for x in X
               s += f(x, x)
           end
           s
       end
applyf (generic function with 1 method)

julia> X = rand(10^6);

julia> f = Foo()
Foo()

# Run the function once with each type of argument to JIT-compile
julia> applyf(f, X)
333375.63216645207

julia> applyf(*, X)
333375.63216645207

# Compile anything used by @time
julia> @time 1
  0.000004 seconds (148 allocations: 10.151 KB)
1

# Now let's benchmark
julia> @time applyf(f, X)
  0.002860 seconds (5 allocations: 176 bytes)
333433.439233112

julia> @time applyf(*, X)
  0.142411 seconds (4.00 M allocations: 61.035 MB, 19.24% gc time)
333433.439233112

Note the big increase in speed and greatly-reduced memory consumption.

Upvotes: 1

Related Questions