Reputation: 997
I have a very large sparse matrix in Octave and I want to get the variance of each row. If I use
std(A,1);
it crashes because memory is exhausted.
Why is this?
The variance should be very easy to calculate for a sparse matrix, shouldn't it?
How can I make this work?
Upvotes: 2
Views: 354
Reputation: 5510
If you want the standard deviation of just the nonzero entries in each column, then you can do:
[nrows, ncols] = size(A);
counts = sum(spones(A),1);
means = sum(A,1) ./ max(counts, 1);
[i,j,v] = find(A);
v = means(j);
placedmeans = sparse(i,j,v,nrows,ncols);
vars = sum((A - placedmeans).^2, 1) ./ max(counts, 1);
stds = sqrt(vars);
I can't imagine a situation where you would want to take the standard deviations of all the terms in each column of a sparse matrix (including zeros), but if so, you only need to count the number of zeros in each column and include them in the calculations:
[nrows,ncols] = size(A);
zerocounts = nrows - sum(spones(A),1);
means = sum(A,1) ./ nrows;
[i,j,v] = find(A);
v = means(j);
placedmeans = sparse(i,j,v,nrows,ncols);
vars = (sum((A - placedmeans).^2, 1) + zerocounts .* means.^2) ./ nrows;
stds = sqrt(vars);
Also, I don't know if you want to subtract one from the denominator of vars (counts and nrows respectively).
EDIT: corrected a bug which reconstructs the placedmeans matrix of the wrong size whenever A ends in a row or column of all zeros. Also, the first case now returns a mean/var/std of zero whenever a column is all zeros (whereas before it would have been NaN)
Upvotes: 2