surya ambati
surya ambati

Reputation: 104

sparse Matrix/ CSC Matrix in pyspark

Can any one please explain Sparse Matrix or CSC Matrix.

Column-major sparse matrix. The entry values are stored in Compressed Sparse Column (CSC) format. For example, the following matrix

   1.0 0.0 4.0
   0.0 3.0 5.0
   2.0 0.0 6.0
 
is stored as values: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0], rowIndices=[0, 2, 1, 0, 1, 2], colPointers=[0, 2, 3, 6].

I got the above example from https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/mllib/linalg/SparseMatrix.html

However, I got what is values, and RowIndices but did not understand the colpointer. Could some one help make me to understand it.

Upvotes: 0

Views: 259

Answers (1)

hpaulj
hpaulj

Reputation: 231395

[0, 2, 3, 6]

Data and rowindices for the first column [0:2]

for 2nd column [2:3]

for 3rd [3:6]

Or to look at it another way, the differences [2,1,3] tell us how many terms there are in each column.

Upvotes: 2

Related Questions