Reputation: 9680
I'm trying to create a matrix based off of 1xN matrices in a fast an efficient way, for later being used as features in scikit-learn training. One of many things I've been trying so far is:
np.matrix(list(func(text) for text in data_test.data))
Which creates a matrix of matrices, like this:
matrix([[ <1x188796 sparse matrix of type '<type 'numpy.float64'>'
with 10921 stored elements in Compressed Sparse Row format>,
<1x188796 sparse matrix of type '<type 'numpy.float64'>'
with 17651 stored elements in Compressed Sparse Row format>,
<1x188796 sparse matrix of type '<type 'numpy.float64'>'
with 28180 stored elements in Compressed Sparse Row format>,...
Which isn't really what I'm looking for, obviously. How can I make this into a more proper matrix, as such:
<76002x108800 sparse matrix of type '<type 'numpy.float64'>'
with 807960 stored elements in Compressed Sparse Row format>
Upvotes: 0
Views: 89
Reputation: 35125
How about http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.sparse.vstack.html
If that's too slow, take the fast path from here: https://github.com/scipy/scipy/blob/master/scipy/sparse/construct.py#L396 (in future Scipy versions, vstack
itself will be fast in this case).
Upvotes: 2