Reputation: 325
X_train[['x_1', 'x_2']].values
> array([[array([ 8, 14, 28, 101, 49, 11, 48, 32, 75, 88]),
array([107, 23, 75, 88, 53, 120, 114, 112, 11, 30])],
[array([107, 23, 75, 88, 53, 120, 114, 112, 11, 30]),
array([ 8, 14, 28, 101, 49, 11, 48, 32, 75, 88])],
[array([ 40, 46, 21, 67, 17, 167, 125, 165, 89, 90]),
array([ 10, 58, 73, 61, 94, 46, 122, 46, 6, 15])],
...,
[array([ 778, 356, 1091, 912, 866, 763, 170, 456, 539, 1059]),
array([ 434, 992, 1437, 980, 949, 916, 714, 2000, 2000, 768])],
[array([ 583, 90, 666, 224, 819, 154, 1399, 340, 99, 201]),
array([1051, 663, 1018, 581, 1188, 2000, 867, 211, 441, 660])],
[array([1051, 663, 1018, 581, 1188, 2000, 867, 211, 441, 660]),
array([ 583, 90, 666, 224, 819, 154, 1399, 340, 99, 201])]],
dtype=object)
I converted this partial dataframe into a numpy array, but it is giving me a 2D shape.
X_train[['x_1', 'x_2']].shape
> (335334, 2)
However, if I copy and paste the output into a Jupyter block, I get a 3D shape all of a sudden.
np.array([[np.array([ 8, 14, 28, 101, 49, 11, 48, 32, 75, 88]),
np.array([107, 23, 75, 88, 53, 120, 114, 112, 11, 30])],
[np.array([107, 23, 75, 88, 53, 120, 114, 112, 11, 30]),
np.array([ 8, 14, 28, 101, 49, 11, 48, 32, 75, 88])],
[np.array([ 40, 46, 21, 67, 17, 167, 125, 165, 89, 90]),
np.array([ 10, 58, 73, 61, 94, 46, 122, 46, 6, 15])]
]).shape
> (3, 2, 10)
Upvotes: 0
Views: 58
Reputation: 231625
Making a dataframe like yours:
In [6]: df['C1']=[np.arange(i,i+3) for i in range(3)]
In [7]: df['C2']=[np.arange(i,i+3) for i in range(5,8)]
In [8]: df
Out[8]:
C1 C2
0 [0, 1, 2] [5, 6, 7]
1 [1, 2, 3] [6, 7, 8]
2 [2, 3, 4] [7, 8, 9]
In [9]: df.values
Out[9]:
array([[array([0, 1, 2]), array([5, 6, 7])],
[array([1, 2, 3]), array([6, 7, 8])],
[array([2, 3, 4]), array([7, 8, 9])]], dtype=object)
One column, a Series, does have a to_list
method:
In [12]: df['C1'].to_list()
Out[12]: [array([0, 1, 2]), array([1, 2, 3]), array([2, 3, 4])]
In [13]: np.array(df['C1'].to_list())
Out[13]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4]])
but a dataframe does not. That makes sense, since a list is 1d, and a frame is 2d.
values
(or to_numpy()
), is a 2d object array:
In [14]: df.values
Out[14]:
array([[array([0, 1, 2]), array([5, 6, 7])],
[array([1, 2, 3]), array([6, 7, 8])],
[array([2, 3, 4]), array([7, 8, 9])]], dtype=object)
np.array(df.values)
doesn't change that. But we can make a nested list from it:
In [15]: df.values.tolist()
Out[15]:
[[array([0, 1, 2]), array([5, 6, 7])],
[array([1, 2, 3]), array([6, 7, 8])],
[array([2, 3, 4]), array([7, 8, 9])]]
and recreate an array from that:
In [16]: np.array(df.values.tolist())
Out[16]:
array([[[0, 1, 2],
[5, 6, 7]],
[[1, 2, 3],
[6, 7, 8]],
[[2, 3, 4],
[7, 8, 9]]])
The copy-n-paste is effectively do the same thing,
np.stack
(or vstack
) can join the arrays of a Series:
In [20]: np.stack(df['C1'])
Out[20]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4]])
But that doesn't work with the nested list from a dataframe. But it does work on a ravel array:
np.stack(df.values.ravel()).reshape(-1,2,3)
Upvotes: 1