user1691278
user1691278

Reputation: 1895

In place update columns of a matrix

I'm trying to optimize the speed and allocation of this loop

function loop(n,k)
    m = rand(10,n)
    r = rand(10,1)
    for j in 1:k
        for i in 2:size(m,2)
            @inbounds m[:,i-1] = m[:,i] + rand!(r)
        end
    end
end

The memory allocation is quite big: @time loop(10000,30) has an allocation of 599.94k, increasing in k. I think there are two contributing factors: (1) allocation of m[:,i] and (2) allocation of m[:,i-1]. I'm hoping that @inbounds can help, but it doesn't. Removing @inbounds doesn't help with the allocation.

Is there a way to reduce allocations? I'm really not creating any new objects, so it should be invariant to k. I tried to replace @inbounds with @view but it didn't even run. I don't think I can use broadcast! here.

Upvotes: 2

Views: 43

Answers (2)

Nils Gudat
Nils Gudat

Reputation: 13800

Prezemyslaw points out the main issues, but I see a further benefit from combining his ideas with your original idea of preallocating r and then using in place rand!:

julia> using Random

julia> function loop3(n,k)
           m = rand(10,n)
           r = rand(10)
           for j in 1:k
               for i in 2:size(m,2)
                   @inbounds m[:,i-1] .= @view(m[:,i]) .+ rand!(r) 
               end
           end
       end
loop3 (generic function with 1 method

Which gives:

julia> @btime loop(1_000, 1_000)
  266.851 ms (3485003 allocations: 327.64 MiB)

julia> @btime loop2(1_000, 1_000)
  101.003 ms (999002 allocations: 152.51 MiB)

julia> @btime loop3(1_000, 1_000)
  61.447 ms (3 allocations: 78.36 KiB)

which is basically now just the allocation of m and r:

julia> @btime begin
           rand(10, 1_000); rand(10)
       end
  15.881 μs (3 allocations: 78.36 KiB)

Upvotes: 1

Przemyslaw Szufel
Przemyslaw Szufel

Reputation: 42264

Use views as they do not materialize, use broadcasting.


function loop2(n,k)
           m = rand(10,n)
           for j in 1:k
               for i in 2:size(m,2)
                   @inbounds m[:,i-1] .= @view(m[:,i]) .+ rand(10)
               end
           end
       end

The new version is 3x faster and uses 3.5x less allocations and 40% of memory:

julia> @btime loop(1000,1000)
  227.918 ms (3485003 allocations: 327.64 MiB)

julia> @btime loop2(1000,1000)
  75.154 ms (999002 allocations: 152.51 MiB)

Upvotes: 2

Related Questions