Reputation: 1146
Assume I have a 2d array.
a = np.array([[0,2,3],[4,2,1]])
The dimension is number_of_instances * 3
, where the values in the 2d array represent the row index in a pandas dataframe
.
I have a dataframe
:
df = pd.DataFrame(np.array([[10, 10, 10, 10], [11, 11, 11, 11], [12, 12, 12, 12], [13, 13, 13, 13], [14, 14, 14, 14]]), columns = list('ABCD'))
Out[23]:
A B C D
0 10 10 10 10
1 11 11 11 11
2 12 12 12 12
3 13 13 13 13
4 14 14 14 14
Now I have a zero 3d array, I try to fill the 3d array by the values in pandas dataframe
.
b = np.empty(2,3,4)
The dimension is number_of_instances * 3 * number_of_features
, where the number_of_features
is extracted from pandas dataframe
by the corresponding row index in 2d array.
Ideally, I would expect b looks like:
Out[24]:
array([[[10, 10, 10, 10],
[12, 12, 12, 12],
[13, 13, 13, 13]],
[[14, 14, 14, 14],
[12, 12, 12, 12],
[11, 11, 11, 11]]])
What is the most efficient way to fill this 3d array?
Upvotes: 1
Views: 773
Reputation:
What you want is called advanced indexing in the official numpy documentation.
For your working example, for example, you should do the following.
First, access the numpy array corresponding to the values of the dataframe by calling df.values. Then, simply do:
df.values[[[0,1,3],[4,2,1]],:]
And you are done.
The above indexing passes a list of two objects to the array. The first is [[0,1,3],[4,2,1]], the second is :. The first is meant to index the 1 axis (rows), the second the 2 axis (columns).
The : symbol just returns all columns.
Now, for the rows, you have a list of two lists: [[0,1,3],[4,2,1]]. This construction will return two arrays, just like what you want. The first array will have the rows 0, 1 and 3, and the second will have 4, 2 and 1.
Numpy is powerfull. You can do much by just leveraging the power of indexing.
Edit: observe that you already have the list [[0,1,3],[4,2,1]] in the variable a. So df.values[a] will do it, as other mentioned. That's because the column : argument is optional in this case. But it is useful to see the full notation.
Upvotes: 0
Reputation: 150735
How about:
df.loc[a.ravel()].values.reshape((2,3,4))
Output:
array([[[10, 10, 10, 10],
[12, 12, 12, 12],
[13, 13, 13, 13]],
[[14, 14, 14, 14],
[12, 12, 12, 12],
[11, 11, 11, 11]]])
Upvotes: 1
Reputation: 59274
Looks like you just need indexing
df.to_numpy()[a]
array([[[10, 10, 10, 10],
[12, 12, 12, 12],
[13, 13, 13, 13]],
[[14, 14, 14, 14],
[12, 12, 12, 12],
[11, 11, 11, 11]]])
Upvotes: 1