mathguy
mathguy

Reputation: 1518

NumPy hstack's weird behavior

Given a little background here. Numpy v1.16, Python 3.6.8.

And then I run the following code:

import numpy as np

arr1 = np.repeat(True,20)
arr2 = np.repeat(np.arange(5),4)

X = np.vstack((arr1,
               arr2 
               )).T

arr3 = np.repeat(True,20).T
arr4 = np.repeat(np.arange(5),4).T

Y = np.hstack((arr3,
               arr4 
               ))

The result is that X.shape is (20,2)(which is normal), but Y.shape is (40,) which is abnormal.

Mathematically X and Y are supposed to be the exact same matrix, but in my machine they aren't. So what am I missing here? Thank you in advance

Upvotes: 0

Views: 232

Answers (3)

hpaulj
hpaulj

Reputation: 231385

In [92]: arr1 = np.repeat(True,10) 
    ...: arr2 = np.repeat(np.arange(5),2)                                                                      
In [93]: arr1.shape                                                             
Out[93]: (10,)
In [94]: arr2.shape                                                             
Out[94]: (10,)

Transpose switches axes, but does not add any.

In [95]: arr1.T.shape                                                           
Out[95]: (10,)

vstack (vertical) makes sure the inputs are atleast 2d, and joins them on the 1st axis

In [96]: np.vstack((arr1,arr2))                                                 
Out[96]: 
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 1, 1, 2, 2, 3, 3, 4, 4]])
In [97]: _.shape                                                                
Out[97]: (2, 10)

Effectively it does:

In [99]: np.concatenate((arr1.reshape(1,-1),arr2.reshape(1,-1)), axis=0)        
Out[99]: 
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 1, 1, 2, 2, 3, 3, 4, 4]])

Note that the boolean True has been changed to numeric 1 so it has the same dtype as arr2.

hstack makes sure the inputs have at least 1 dimension, and joins on the last. [source]

In [100]: np.hstack((arr1,arr2))                                                
Out[100]: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4])
In [101]: _.shape                                                               
Out[101]: (20,)

Again transpose doesn't change the 1d shape.

Another convenience function:

In [102]: np.column_stack((arr1,arr2)).shape                                    
Out[102]: (10, 2)

this makes the inputs 2d, and joins on the last axis (look at its code for details)

yet another convenience:

In [103]: np.stack((arr1,arr2),axis=1).shape                                    
Out[103]: (10, 2)
In [104]: np.stack((arr1,arr2),axis=0).shape                                    
Out[104]: (2, 10)

All of these just tweak the dimensions and then use concatenate.

structured array

In [110]: arr = np.zeros((10,), dtype='bool,i')                                 
In [111]: arr['f0']=arr1                                                        
In [112]: arr['f1']=arr2                                                        
In [113]: arr                                                                   
Out[113]: 
array([( True, 0), ( True, 0), ( True, 1), ( True, 1), ( True, 2),
       ( True, 2), ( True, 3), ( True, 3), ( True, 4), ( True, 4)],
      dtype=[('f0', '?'), ('f1', '<i4')])

Upvotes: 1

BENY
BENY

Reputation: 323226

Your problem is Even with T but your arr is one dimension (n,) , which mean you can not simple T to make it become (n,1) dimension

How to fix it : with numpy broadcast to get (n,1)

Y = np.hstack((arr3[:,None],
               arr4[:,None] 
               ))
Y
Out[14]: 
array([[1, 0],
       [1, 0],
       [1, 0],
       [1, 0],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 2],
       [1, 2],
       [1, 2],
       [1, 2],
       [1, 3],
       [1, 3],
       [1, 3],
       [1, 3],
       [1, 4],
       [1, 4],
       [1, 4],
       [1, 4]])

Upvotes: 1

James
James

Reputation: 36608

Transposing 1-d arrays such as arr3 and arr4 returns a 1-d array, not a 2-d array.

np.repeat(True,5)
# returns:
array([ True,  True,  True,  True,  True])

np.repeat(True,5).T
# returns:
array([ True,  True,  True,  True,  True])

It does not produce a new axis. You need to do that before transposing.

To increase the number of axes, you can use np.newaxis.

a = np.repeat(True, 5)
a[:, np.newaxis]
# returns:
array([[ True],
       [ True],
       [ True],
       [ True],
       [ True]])

a[:, np.newaxis].T
# returns:
array([[ True,  True,  True,  True,  True]])

Upvotes: 4

Related Questions