Bratt Swan
Bratt Swan

Reputation: 1146

What is the efficient way to fill a 3d Array based on a 2d array?

Assume I have a 2d array.

a = np.array([[0,2,3],[4,2,1]])

The dimension is number_of_instances * 3, where the values in the 2d array represent the row index in a pandas dataframe.

I have a dataframe:

df = pd.DataFrame(np.array([[10, 10, 10, 10], [11, 11, 11, 11], [12, 12, 12, 12], [13, 13, 13, 13], [14, 14, 14, 14]]), columns = list('ABCD'))

Out[23]: 
   A   B   C   D
0  10  10  10  10
1  11  11  11  11
2  12  12  12  12
3  13  13  13  13
4  14  14  14  14

Now I have a zero 3d array, I try to fill the 3d array by the values in pandas dataframe.

b = np.empty(2,3,4)

The dimension is number_of_instances * 3 * number_of_features, where the number_of_features is extracted from pandas dataframe by the corresponding row index in 2d array.

Ideally, I would expect b looks like:

Out[24]:
array([[[10, 10, 10, 10],
        [12, 12, 12, 12],
        [13, 13, 13, 13]],
       [[14, 14, 14, 14],
        [12, 12, 12, 12],
        [11, 11, 11, 11]]])

What is the most efficient way to fill this 3d array?

Upvotes: 1

Views: 773

Answers (3)

user6851498
user6851498

Reputation:

What you want is called advanced indexing in the official numpy documentation.

For your working example, for example, you should do the following.

First, access the numpy array corresponding to the values of the dataframe by calling df.values. Then, simply do:

df.values[[[0,1,3],[4,2,1]],:]

And you are done.

The above indexing passes a list of two objects to the array. The first is [[0,1,3],[4,2,1]], the second is :. The first is meant to index the 1 axis (rows), the second the 2 axis (columns).

The : symbol just returns all columns.

Now, for the rows, you have a list of two lists: [[0,1,3],[4,2,1]]. This construction will return two arrays, just like what you want. The first array will have the rows 0, 1 and 3, and the second will have 4, 2 and 1.

Numpy is powerfull. You can do much by just leveraging the power of indexing.

Edit: observe that you already have the list [[0,1,3],[4,2,1]] in the variable a. So df.values[a] will do it, as other mentioned. That's because the column : argument is optional in this case. But it is useful to see the full notation.

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150735

How about:

df.loc[a.ravel()].values.reshape((2,3,4))

Output:

array([[[10, 10, 10, 10],
        [12, 12, 12, 12],
        [13, 13, 13, 13]],

       [[14, 14, 14, 14],
        [12, 12, 12, 12],
        [11, 11, 11, 11]]])

Upvotes: 1

rafaelc
rafaelc

Reputation: 59274

Looks like you just need indexing

df.to_numpy()[a]

array([[[10, 10, 10, 10],
        [12, 12, 12, 12],
        [13, 13, 13, 13]],

       [[14, 14, 14, 14],
        [12, 12, 12, 12],
        [11, 11, 11, 11]]])

Upvotes: 1

Related Questions