Frames Catherine White
Frames Catherine White

Reputation: 28222

How can I speed up application of Log(x+1) to a sparse array in Julia

A sparse matrix in Julia only stores nonzero elements.

Some functions, such as log(x+1) (in all bases), map zero to zero, and thus don't need to be applied to those zero elements. (I think we would call this a Monoid homomorphism.)

How can I use this fact to speed up an operation?

Example code:

X = sprand(10^4,10^4, 10.0^-5, rand)
function naiveLog2p1(N::SparseMatrixCSC{Float64,Int64})
    log2(1+N) |> sparse
end

Running:

@time naiveLog2p1(X)

Output is:

elapsed time: 2.580125482 seconds (2289 MB allocated, 6.86% gc time in 3 pauses with 0 full sweep)

On a second time (so that the function is expected to be already compiled):

elapsed time: 2.499118888 seconds (2288 MB allocated, 8.17% gc time in 3 pauses with 0 full sweep)

Little change, presumably cos it is so simple to compile.

Upvotes: 3

Views: 268

Answers (3)

epx
epx

Reputation: 809

As per suggestion of the Julia manual on "Sparse matrix operations" I would convert the sparse matrix into a dense one using findnz(), do the log operations on the values and the reconstruc the sparse matrix with sparse().

function improvedLog2p1(N::SparseMatrixCSC{Float64,Int64})
    I,J,V = findnz(N)
    return sparse(I,J,log2(1+V))
end

@time improvedLog2p1(X)
elapsed time: 0.000553508 seconds (473288 bytes allocated)

Upvotes: 2

IainDunning
IainDunning

Reputation: 11664

My solution would be to actually operate on the inside of the data structure itself:

mysparselog(N::SparseMatrixCSC) =
    SparseMatrixCSC(N.m, N.n, copy(N.colptr), copy(N.rowval), log2(1+N.nzval))

Note that if you want to operate on the sparse matrix in place, which would be fairly often in practice I imagine, this would be a zero-memory operation. Benchmarking reveals this performs similar to the @Oxinabox answer, as it is about the same in terms of memory operations (although that answer doesn't actually return the new matrix, as shown by the mean output):

with warmup times removed:

naiveLog2p1
elapsed time: 1.902405905 seconds (2424151464 bytes allocated, 10.35% gc time)
mean(M) => 0.005568094618997372

mysparselog
elapsed time: 0.022551705 seconds (24071168 bytes allocated)
elapsed time: 0.025841895 seconds (24071168 bytes allocated)
mean(M) => 0.005568094618997372

improvedLog2p1
elapsed time: 0.018682775 seconds (32068160 bytes allocated)
elapsed time: 0.027129497 seconds (32068160 bytes allocated)
mean(M) => 0.004995127985160583

Upvotes: 1

Frames Catherine White
Frames Catherine White

Reputation: 28222

What you are looking for is the sparse nonzeros function.

nonzeros(A)

Return a vector of the structural nonzero values in sparse matrix A. This includes zeros that are explicitly stored in the sparse matrix. The returned vector points directly to the internal nonzero storage of A, and any modifications to the returned vector will mutate A as well.

You can use this as below:

function improvedLog2p1(N::SparseMatrixCSC{Float64,Int64})
    M = copy(N)
    ms = nonzeros(M) #Creates a view, 
    ms = log2(1+ms) #changes to ms, change M
    M
end


@time improvedLog2p1(X)

running for the first time output is:

 elapsed time: 0.002447847 seconds (157 kB allocated)

running for the second time output is:

 0.000102335 seconds (109 kB allocated)

That is a 4 orders of magnitude improvement in speed and memory use.

Upvotes: 0

Related Questions