Eric Kim
Eric Kim

Reputation: 2698

Python numpy and pandas matrix dimensions

I have variables that looks like these:

data.head()

   Ones  Population   Profit
0     1      6.1101  17.5920
1     1      5.5277   9.1302
2     1      8.5186  13.6620
3     1      7.0032  11.8540
4     1      5.8598   6.8233

X     = data.iloc[:, 0:cols]
y     = data.iloc[:, cols]

X1 = np.matrix(X.values)
y1 = np.matrix(y.values)

X.shape
>>(97, 2)
y.shape
>>(97,)

X1.shape
>>(97, 2)
y1.shape
>>(1, 97)

data is in pandas frame.

I expected the dimension of y1 would be 97 X 1, but instead it is 1 X 97. Somehow y1 was transposed in the middle, and I don't understand why this is happening. Since my original y panda array was 97 X 1, I thought y1 should be the same too, but apparently thats not how it works

Any explanations?

Upvotes: 1

Views: 1751

Answers (2)

Ray
Ray

Reputation: 184

y.values converts the column into a numpy array, which has 1 dimension, like

[1, 2, 3, 4, 5]

if you call np.matrix on that array, it will return

[[1, 2, 3, 4, 5]]

However, if you transpose the 1 dimension array into 2 dimension first before you call np.matrix, you will get (5, 1) matrix,

>>> a = np.array([1, 2, 3, 4, 5])
>>> a.shape
(5,)
>>> a
array([1, 2, 3, 4, 5])
>>> np.matrix(a).shape
(1, 5)

>>> a.reshape(-1, 1)
array([[1],
       [2],
       [3],
       [4],
       [5]])
>>> np.matrix(a.reshape(-1, 1)).shape
(5, 1)

>>> np.matrix(a.reshape(-1, 1))
matrix([[1],
        [2],
        [3],
        [4],
        [5]])

Upvotes: 1

chrisb
chrisb

Reputation: 52236

Unsolicited advice, use of matrix isn't really recommended. The biggest thing it bought you was operator * for matrix multiplication, but with python's 3.5 matmul operator @ that's not really necessary.

That said, they key thing to note here is that the shape of y is not 97 x 1, it is 97, that is a one dimensional array. A numpy matrix is always two dimensional, and simply by convention a 1-d array is a converted into a 1 x X matrix.

Upvotes: 1

Related Questions