Giorgos Myrianthous
Giorgos Myrianthous

Reputation: 39900

How to concatenate a coo_matrix with a column numpy array

I have a coo_matrix a with shape (40106, 2048) and a column numpy array b with shape (40106,).

What I want to do is to simply concatenate the matrix and the array (i.e. the resulting data structure will have shape (40106, 2049) ). I've tried to use hstack as shown below

concat = hstack([a, b])

but I get the following error:

File "/Users/usr/anaconda/lib/python3.5/site-packages/scipy/sparse/construct.py", line 464, in hstack
    return bmat([blocks], format=format, dtype=dtype)
File "/Users/usr/anaconda/lib/python3.5/site-packages/scipy/sparse/construct.py", line 581, in bmat
    'row dimensions' % i)
ValueError: blocks[0,:] has incompatible row dimensions

I don't quite get why the dimensions do not match since both a and b have the same number of rows.

Upvotes: 2

Views: 1053

Answers (2)

hpaulj
hpaulj

Reputation: 231550

I assume that's sparse.hstack. Your b when converted to a matrix will be (1,40106). Try turning it into a correct sparse matrix before passing it to hstack. hstack passes the job to bmat, which ends up joining the coo attributes of all the input matrices, thus making a new matrix

In [66]: from scipy import sparse
In [67]: A = sparse.coo_matrix(np.eye(3))
In [68]: b = np.ones(3)
In [69]: sparse.hstack((A,b))
....
ValueError: blocks[0,:] has incompatible row dimensions
In [70]: B=sparse.coo_matrix(b)
In [71]: B
Out[71]: 
<1x3 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in COOrdinate format>
In [72]: sparse.hstack((A,B.T))
Out[72]: 
<3x4 sparse matrix of type '<class 'numpy.float64'>'
    with 6 stored elements in COOrdinate format>
In [73]: _.A
Out[73]: 
array([[ 1.,  0.,  0.,  1.],
       [ 0.,  1.,  0.,  1.],
       [ 0.,  0.,  1.,  1.]])

this also works (as in Divakar's answer):

In [74]: sparse.hstack((A,b[:,None]))
Out[74]: 
<3x4 sparse matrix of type '<class 'numpy.float64'>'
    with 6 stored elements in COOrdinate format>

My hastack does:

return bmat([blocks], format=format, dtype=dtype)

So a direct call bmat also works

In [93]: sparse.bmat([[A, B.T]])
Out[93]: 
<3x4 sparse matrix of type '<class 'numpy.float64'>'
    with 6 stored elements in COOrdinate format>

sparse.bmat([A, B.T]) produces your blocks must be 2d error.

Upvotes: 1

Divakar
Divakar

Reputation: 221634

Convert the second array, which is 1D to 2D and use then hstack -

hstack([A,B[:,None]])

Sample run -

In [86]: from scipy.sparse import coo_matrix, hstack

# Sample inputs as a coo_matrix and an array
In [87]: A = coo_matrix([[1, 2, 0], [3, 0, 4]])
    ...: B = np.array([5, 6])
    ...: 

# Use proposed solution
In [88]: out = hstack([A,B[:,None]])

# Print the dense version to visually verify
In [89]: out.toarray()
Out[89]: 
array([[1, 2, 0, 5],
       [3, 0, 4, 6]])

Upvotes: 1

Related Questions