Reputation: 1
For example, if I initially have a dense matrix:
A = numpy.array([[0, 0],[0, 1]])
and then convert it to a csc sparse matrix using csc_matrix(A). The matrix is then stored as:
(1, 1) 1
#(row, column) val
which comprises of three values. Why is the size of the sparse matrix only 8 bytes, even though the computer is essentially storing 3 values? Surely the size of the matrix would be a least 12 bytes, since an integer usually holds 4 bytes.
Upvotes: 0
Views: 159
Reputation: 16941
I don't agree that the size of the sparse matrix is 8 bytes. I may be missing something, but if I do this, I get a very different answer:
>>> import sys
>>> import numpy
>>> from scipy import sparse
>>> A = numpy.array([[0, 0],[0, 1]])
>>> s = sparse.csc_matrix(A)
>>> s
<2x2 sparse matrix of type '<class 'numpy.int32'>'
with 1 stored elements in Compressed Sparse Column format>
>>> sys.getsizeof(s)
56
This is the size of the data structure in memory and I assure you that it is accurate. Python must know how big it is, because it does the memory allocation.
If, on the other hand, you use s.data.nbytes
:
>>> s.data.nbytes
4
This gives the expected answer of 4. It is expected because s
reports itself as having one stored element of type int32
. The value returned, according to the docs,
does not include memory consumed by non-element attributes of the array object.
This is not a more accurate result, just an answer to a different question, as 35421869 makes clear.
I can't explain why you report a value of 8 bytes when the result 4 is clearly correct. One possibility is that numpy.array([[0, 0],[0, 1]])
is not in fact what was actually converted to the sparse array. Where did the value 5 come from? The value of 8 is consistent with a beginning value of numpy.array([[0, 0],[0, 5.0]])
.
Your figure of 12 bytes is based on two unmet expectations.
nbytes
does not report the total memory cost of storing the elements of the matrix. It reports a numpy
invariant (over many different kinds of matrix) x.nbytes == np.prod(x.shape) * x.itemsize
. This is an important quantity because the explicitly stored elements of the matrix form its biggest subsidiary data structure and must be allocated in contiguous memory. Upvotes: 1