aerin
aerin

Reputation: 22624

can't understand scipy.sparse.csr_matrix example

I can't wrap my head around csr_matrix examples in scipy documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html

Can someone explain how this example work?

>>> row = np.array([0, 0, 1, 2, 2, 2])
>>> col = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, (row, col)), shape=(3, 3)).toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

I believe this is following this format.

csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)])

where data, row_ind and col_ind satisfy the relationship a[row_ind[k], col_ind[k]] = data[k].

What is a here?

Upvotes: 10

Views: 18789

Answers (5)

quasar
quasar

Reputation: 474

@Rohit Pandey stated correctly, I just want to add an example on that.

When most of the elements of a matrix have 0 values, then we call this a sparse matrix. The process includes removing zero elements from the matrix and thus saving memory space and computing time. We only store non-zero items with their respected row and column index. i.e.

0 3 0 4

0 5 7 0

0 0 0 0

0 2 6 0

We calculate the sparse matrix by putting non-zero items row index first, then column index, and finally non-zero values like the following:

Row 0 0 1 1 3 3
Column 1 3 1 2 1 2
Value 3 4 5 7 2 6

By reversing the process we get the simple matrix form from the sparse form.

Upvotes: 0

IndPythCoder
IndPythCoder

Reputation: 753

Represent the "data" in a 4 X 4 Matrix:

data = np.array([10,0,5,99,25,9,3,90,12,87,20,38,1,8])
indices = np.array([0,1,2,3,0,2,3,0,1,2,3,1,2,3])
indptr  = np.array([0,4,7,11,14]) 

illustration of CSR_Matrix

  • 'indptr'- Index pointers is linked list of pointers to 'indices' (Column index Pointers)...
  • indptr[i:i+1] represents i to i+1 index of pointer
  • 14 reprents len of Data len(data)... indptr = np.array([0,4,7,11,len(data)]) other way of represenint 'indptr'
  • 0,4 --> 0:4 represents pointers to indices 0,1,2,3
  • 4,7 --> 4:7 represents the pointers of indices 0,2,3
  • 7,11 --> 7:11 represents the pointers of 0,1,2,3
  • 11,14 --> 11:14 represents pointers 1,2,3
#  Representing the data in a 4,4 matrix 

a = csr_matrix((data,indices,indptr),shape=(4,4),dtype=np.int)
a.todense()

matrix([[10,  0,  5, 99],
        [25,  0,  9,  3],
        [90, 12, 87, 20],
        [ 0, 38,  1,  8]])

Another Stackoverflow explanation

Upvotes: 4

pplkjh
pplkjh

Reputation: 301

row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])

from the above arrays;

for k in 0~5
a[row_ind[k], col_ind[k]] = data[k]

                  a 
row[0],col[0] = [0,0] = 1 (from data[0])  
row[1],col[1] = [0,2] = 2 (from data[1])  
row[2],col[2] = [1,2] = 3 (from data[2])  
row[3],col[3] = [2,0] = 4 (from data[3])  
row[4],col[4] = [2,1] = 5 (from data[4])  
row[5],col[5] = [2,2] = 6 (from data[5])

so let's arrange matrix 'a' in shape(3X3)

a
   0  1  2
0 [1, 0, 2]  
1 [0, 0, 3]  
2 [4, 5, 6]

Upvotes: 19

AutoRun
AutoRun

Reputation: 11

As far as I understand, in row and col arrays we have indices which corrensponds to non-zero values in matrix. a[0, 0] = 1, a[0, 2] = 2, a[1, 2] = 3 and so on. As we have no indices for a[0, 1], a[1, 0], a[1, 1] so appropriate values in matrix are equal to 0.

Also, maybe this little intro will be helpful for you: https://www.youtube.com/watch?v=Lhef_jxzqCg

Upvotes: 1

Rohit Pandey
Rohit Pandey

Reputation: 2681

This is a sparse matrix. So, it stores the explicit indices and values at those indices. So for example, since row=0 and col=0 corresponds to 1 (the first entries of all three arrays in your example). Hence, the [0,0] entry of the matrix is 1. And so on.

Upvotes: 9

Related Questions