Abhishek Thakur
Abhishek Thakur

Reputation: 16995

list of numpy vectors to sparse array

I have a list of numpy vectors of the format:

    [array([[-0.36314615,  0.80562619, -0.82777381, ...,  2.00876354,2.08571887, -1.24526026]]), 
     array([[ 0.9766923 , -0.05725135, -0.38505339, ...,  0.12187988,-0.83129255,  0.32003683]]),
     array([[-0.59539878,  2.27166874,  0.39192573, ..., -0.73741573,1.49082653,  1.42466276]])]

here, only 3 vectors in the list are shown. I have 100s..

The maximum number of elements in one vector is around 10 million

All the arrays in the list have unequal number of elements but the maximum number of elements is fixed. Is it possible to create a sparse matrix using these vectors in python such that I have zeros in place of elements for the vectors which are smaller than the maximum size?

Upvotes: 4

Views: 1581

Answers (3)

Saullo G. P. Castro
Saullo G. P. Castro

Reputation: 58865

In this approach you replace the elements below your thresold by 0 and then create a sparse matrix out of them. I am suggesting the coo_matrix since it is the fastest to convert to the other types according to your purposes. Then you can scipy.sparse.vstack() them to build your matrix accounting all elements in the list:

import scipy.sparse as ss
import numpy as np

old_list = [np.random.random(100000) for i in range(5)]

threshold = 0.01
for a in old_list:
    a[np.absolute(a) < threshold] = 0
old_list = [ss.coo_matrix(a) for a in old_list]
m = ss.vstack( old_list )

Upvotes: 2

Jaime
Jaime

Reputation: 67417

A little convoluted, but I would probably do it like this:

>>> import scipy.sparse as sps
>>> a = [np.arange(5), np.arange(7), np.arange(3)]
>>> lens = [len(j) for j in a]
>>> cols = np.concatenate([np.arange(j) for j in lens])
>>> rows = np.concatenate([np.repeat(j, len_) for j, len_ in enumerate(lens)])
>>> data = np.concatenate(a)
>>> b = sps.coo_matrix((data,(rows, cols)))
>>> b.toarray()
array([[0, 1, 2, 3, 4, 0, 0],
       [0, 1, 2, 3, 4, 5, 6],
       [0, 1, 2, 0, 0, 0, 0]])

Upvotes: 1

DrRobotNinja
DrRobotNinja

Reputation: 1421

Try this:

from scipy import sparse
M = sparse.lil_matrix((num_of_vectors, max_vector_size))

for i,v in enumerate(vectors):
     M[i, :v.size] = v

Then take a look at this page: http://docs.scipy.org/doc/scipy/reference/sparse.html

The lil_matrix format is good for constructing the matrix, but you'll want to convert it to a different format like csr_matrix before operating on them.

Upvotes: 3

Related Questions