Chob
Chob

Reputation: 93

How do np.vstack and np.hstack work in Python?

I don't understand why I should use np.hstack to adjust vector y

y_combined=np.hstack((y_train, y_test))

And not np.vstack. I get an error when I use np.vstack

ValueError:all the input array dimensions for the concatenation axis must
match exactly, but along dimension 1, the array at index 0 has size 105
and the array at index 1 has size 45

But I don't get that error when I use np.hstack, why this happens?

iris = datasets.load_iris()
X=iris.data[:,[2,3]]
y=iris.target

X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
sc= StandardScaler()
sc.fit(X_train)
X_train_std=sc.transform(X_train)
X_test_std= sc.transform(X_test)

ppn= Perceptron( max_iter=40,eta0= 0.1, random_state=1)
ppn.fit(X_train_std, y_train)

y_pred= ppn.predict(X_test_std)
def plot_decision_regions(X, y, classifier,test_idx=None, resolution = 0.02):
    markers = ('s', 'x', 'o', '^','v')
    colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
    cmap = ListedColormap(colors[:len(np.unique(y))])

    x1_min, x1_max = X[:, 0].min() -1, X[:,0].max() + 1
    x2_min, x2_max = X[:, 1].min() -1, X[:,1].max() + 1
    xx1, xx2= np.meshgrid (np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution))
    Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
    Z = Z.reshape(xx1.shape)
    plt.contourf(xx1, xx2, Z, alpha= 0.3, cmap = cmap)
    plt.xlim(xx1.min(), xx1.max())
    plt.ylim(xx2.min(), xx2.max())


    for idx, cl in enumerate (np.unique(y)):
        plt.scatter (x=X[y == cl, 0], y= X[y == cl, 1], alpha=0.8, c=colors[idx], marker= markers [idx], label = cl, edgecolor = 'black')

    if test_idx:
        X_test, y_test= X[test_idx,:], y[test_idx]


        plt.scatter(X_test[:,0], X_test[:,1], c='', edgecolor= 'black', alpha= 0.9, linewidth=1, marker='o', s=100, label='test set' )


X_combined_std= np.vstack((X_train_std, X_test_std))
y_combined=np.hstack((y_train, y_test))
plot_decision_regions(X=X_combined_std, y=y_combined, classifier=ppn, test_idx=range(105,150))
plt.xlabel('sepal length [standardized]')
plt.ylabel('petal length [standardized]')
plt.legend(loc='upper left')
plt.show()

Upvotes: 0

Views: 1139

Answers (1)

norok2
norok2

Reputation: 26886

Assume we have two arrays of shape (2, 3) each, say:

a = np.array([[11, 12, 13], [14, 15, 16]])
b = np.array([[17, 18, 19], [20, 21, 22]])

Both hstack() and vstack() would stack the two arrays, but along different dimensions:

np.vstack((a, b))
# array([[11, 12, 13],
#        [14, 15, 16],
#        [17, 18, 19],
#        [20, 21, 22]])

np.hstack((a, b))
# array([[11, 12, 13, 17, 18, 19],
#        [14, 15, 16, 20, 21, 22]])

Now you can do both hstack() and vstack() because a and b do have the same shape, but what is the condition on the shapes if they are not the same?

For vstack, the second dimension (index 1) must match, while for hstack, it is the first dimension (index 0) that must match. The error your are getting, is telling you precisely this.

Upvotes: 1

Related Questions