Reputation: 28222
A sparse matrix in Julia only stores nonzero elements.
Some functions, such as log(x+1)
(in all bases),
map zero to zero, and thus don't need to be applied to those zero elements.
(I think we would call this a Monoid homomorphism.)
How can I use this fact to speed up an operation?
Example code:
X = sprand(10^4,10^4, 10.0^-5, rand)
function naiveLog2p1(N::SparseMatrixCSC{Float64,Int64})
log2(1+N) |> sparse
end
Running:
@time naiveLog2p1(X)
Output is:
elapsed time: 2.580125482 seconds (2289 MB allocated, 6.86% gc time in 3 pauses with 0 full sweep)
On a second time (so that the function is expected to be already compiled):
elapsed time: 2.499118888 seconds (2288 MB allocated, 8.17% gc time in 3 pauses with 0 full sweep)
Little change, presumably cos it is so simple to compile.
Upvotes: 3
Views: 268
Reputation: 809
As per suggestion of the Julia manual on "Sparse matrix operations" I would convert the sparse matrix into a dense one using findnz()
, do the log operations on the values and the reconstruc the sparse matrix with sparse()
.
function improvedLog2p1(N::SparseMatrixCSC{Float64,Int64})
I,J,V = findnz(N)
return sparse(I,J,log2(1+V))
end
@time improvedLog2p1(X)
elapsed time: 0.000553508 seconds (473288 bytes allocated)
Upvotes: 2
Reputation: 11664
My solution would be to actually operate on the inside of the data structure itself:
mysparselog(N::SparseMatrixCSC) =
SparseMatrixCSC(N.m, N.n, copy(N.colptr), copy(N.rowval), log2(1+N.nzval))
Note that if you want to operate on the sparse matrix in place, which would be fairly often in practice I imagine, this would be a zero-memory operation. Benchmarking reveals this performs similar to the @Oxinabox answer, as it is about the same in terms of memory operations (although that answer doesn't actually return the new matrix, as shown by the mean
output):
with warmup times removed:
naiveLog2p1
elapsed time: 1.902405905 seconds (2424151464 bytes allocated, 10.35% gc time)
mean(M) => 0.005568094618997372
mysparselog
elapsed time: 0.022551705 seconds (24071168 bytes allocated)
elapsed time: 0.025841895 seconds (24071168 bytes allocated)
mean(M) => 0.005568094618997372
improvedLog2p1
elapsed time: 0.018682775 seconds (32068160 bytes allocated)
elapsed time: 0.027129497 seconds (32068160 bytes allocated)
mean(M) => 0.004995127985160583
Upvotes: 1
Reputation: 28222
What you are looking for is the sparse nonzeros function.
nonzeros(A)
Return a vector of the structural nonzero values in sparse matrix
A
. This includes zeros that are explicitly stored in the sparse matrix. The returned vector points directly to the internal nonzero storage ofA
, and any modifications to the returned vector will mutateA
as well.
You can use this as below:
function improvedLog2p1(N::SparseMatrixCSC{Float64,Int64})
M = copy(N)
ms = nonzeros(M) #Creates a view,
ms = log2(1+ms) #changes to ms, change M
M
end
@time improvedLog2p1(X)
running for the first time output is:
elapsed time: 0.002447847 seconds (157 kB allocated)
running for the second time output is:
0.000102335 seconds (109 kB allocated)
That is a 4 orders of magnitude improvement in speed and memory use.
Upvotes: 0