Reputation: 1995
I'm trying to manipulate sparse binary matrices in GNU Octave, and it's using way more memory than I expect, and relevant sparse-matrix functions don't behave the way I want them to. I see this question about higher-than-expected sparse-matrix storage in MATLAB, which suggests that this matrix should consume even more memory, but helped explain (only) part of this situation.
For a sparse, binary matrix, I can't figure out any way to get Octave to NOT STORE the array of values (they're always implicitly 1
, so need not be stored). Can this be done? Octave always seems to consume memory for a values array.
A trimmed-down example demonstrating the situation: create random sparse matrix, turn it into "binary":
mys=spones(sprandn(1024,1024,.03)); nnz(mys), whos mys
Shows the situation. The consumed size is consistent with the storage mechanism outlined in aforementioned SO answer and expanded below, if spones()
creates an array of storage-class double
and if all indices are 32-bit (i.e.,
TotalStorageSize - rowIndices - columnIndices == NumNonZero*sizeof(double)-- unnecessarily storing these values (all
1
s as double
s) is over half of the total memory consumed by this 3%-sparse object.
After messing with this (for too long) while composing this question, I discovered some partial workarounds, so I'm going to "self-answer" (only) part of the question for continuity (hopefully), but I didn't figure out an adequate answer to main question:
How do I create an efficiently-stored ("no-/implicit-values") binary matrix in Octave?
Additional background on storage format follows...
The Octave docs say the storage format for sparse matrices uses format Compressed Sparse Column (CSC). This seems to imply storing the following arrays (expanding on aforementioned SO answer, with canonical Yale format labels
and tweaks for column-major order):
A
), number-of-nonzeros (NNZ) entries of storage-class size;IA
), NNZ entries of index size (hopefully int64
but maybe int32
);JA
), number-of-columns-plus-1 entries of index size)In this case, for binary-only storage, I hope there's a way to completely avoid storing array (A
), but I can't figure it out.
Upvotes: 1
Views: 1393
Reputation: 1995
Full disclosure: As noted above, as I was composing this question, I discovered a workaround to reduce memory usage, so I'm "self-answering" part of this here, but it still isn't fully satisfying, so I'm still listening for a better actual answer to storage of a sparse binary matrix without a trivial, bloated, unnecessary values array...
To get a binary-like value out of a number-like value and reduce the memory usage in this case, use "logical" storage, created by logical(X)
. For example, building from above,
logicalmys = logical(mys);
creates a sparse bool matrix
, that takes up less memory (1-byte logical
rather than 8-byte double
for the values array).
Adding more information to the whos
information using whos_line_format
helps illuminate the situation: The default string includes 5 of the 7 properties (see docs for more). I'm using the format string
whos_line_format(" %a:4; %ln:6; %cs:16:6:1; %rb:12; %lc:8; %e:10; %t:20;\n")
to add display of "elements", and "type" (which is distinct from "class").
With that, whos mys logicalmys
shows something like
Attr Name Size Bytes Class Elements Type
==== ==== ==== ===== ===== ======== ====
mys 1024x1024 391100 double 32250 sparse matrix
logicalmys 1024x1024 165350 logical 32250 sparse bool matrix
So this shows a distinction between sparse matrix
and sparse bool matrix
. However, the total memory consumed by logicalmys
is consistent with actually storing an array of NNZ booleans (1-byte) -- That is:
in numbers,
165350 - 32250*4 - 1025*4 == 32250
. So we're still storing 32250 elements, all of which are 1
. Further, if you set one of the 1
-elements to zero, it reduces the reported storage! For a good time, try: pick a nonzero element, e.g., (42,1)
, then zero it: logicalmys(42,1) = 0;
then whos
it!
My hope is that this is correct, and that this clarifies some things for those who might be interested. Comments, corrections, or actual answers welcome!
Upvotes: 1