Syed Arefinul Haque
Syed Arefinul Haque

Reputation: 1335

Efficiently subtract the mean off the columns of a sparse matrix in Python

Say, we have a scipy sparse matrix

from scipy.sparse import csc_matrix
mat = scipy.sparse.csc_matrix([[0, 1, 2]
                               [0, -1, 3]])

The columnar means are 0, 0, 2.5 So the result after subtracting the columnar means from each element of a column should be,

[
   [0, 1, -.5]
   [0, -1, .5]
]

As the matrices are huge, is there an efficient way, i.e. without using .toarray() to calculate it?

Upvotes: 0

Views: 289

Answers (1)

orlp
orlp

Reputation: 117991

There is no efficient way, because unless your means are almost all zero the resulting matrix will not be sparse.

Your only option is to carry this information separately forward (e.g. as a constant offset per column) and change your algorithm(s) appropriately, or to switch to a dense matrix.

Upvotes: 2

Related Questions