Reputation: 3099
Something's odd with the data here.
If I create a scipy.sparse.csr_matrix
with the data
property containing only 0s and 1s, and then ask it to print the data property, sometimes there are 2s in the output (other times not).
You can see this behaviour here:
from scipy.sparse import csr_matrix
import numpy as np
from collections import OrderedDict
#Generate some fake data
#This makes an OrderedDict of 10 scipy.sparse.csr_matrix objects,
#with 3 rows and 3 columns and binary (0/1) values
od = OrderedDict()
for i in range(10):
row = np.random.randint(3, size=3)
col = np.random.randint(3, size=3)
data = np.random.randint(2, size=3)
print 'data is: ', data
sp_matrix = csr_matrix((data, (row, col)), shape=(3, 3))
od[i] = sp_matrix
#Print the data in each scipy sparse matrix
for i in range(10):
print 'data stored in sparse matrix: ', od[i].data
It'll print something like this:
data is: [1 0 1]
data is: [0 0 1]
data is: [0 0 0]
data is: [0 0 0]
data is: [1 1 1]
data is: [0 0 0]
data is: [1 1 0]
data is: [1 0 1]
data is: [0 0 0]
data is: [0 0 1]
data stored in sparse matrix: [1 1 0]
data stored in sparse matrix: [0 0 1]
data stored in sparse matrix: [0 0]
data stored in sparse matrix: [0 0 0]
data stored in sparse matrix: [2 1]
data stored in sparse matrix: [0 0 0]
data stored in sparse matrix: [1 1 0]
data stored in sparse matrix: [1 1 0]
data stored in sparse matrix: [0 0 0]
data stored in sparse matrix: [1 0 0]
Why does the data stored in the sparse matrix not reflect the data originally put there (there were no 2s in the original data)?
Upvotes: 1
Views: 431
Reputation: 33532
I'm assuming, your kind of matrix-creation:
sp_matrix = csr_matrix((data, (row, col)), shape=(3, 3))
will use coo_matrix
under the hood (not found the relevant sources yet; see bottom).
In this case, the docs say (for COO):
By default when converting to CSR or CSC format, duplicate (i,j) entries will be summed together. This facilitates efficient construction of finite element matrices and the like. (see example)
Your random-matrix routine does not check for duplicate entries.
Edit: Ok. It think i found the code.
csr_matrix: no constructor-code -> inheritance from _cs_matrix
and there:
else:
if len(arg1) == 2:
# (data, ij) format
from .coo import coo_matrix
other = self.__class__(coo_matrix(arg1, shape=shape))
self._set_self(other)
Upvotes: 2