Reputation: 2467
I got a numpy array as below:
[[3.4, 87]
[5.5, 11]
[22, 3]
[4, 9.8]
[41, 11.22]
[32, 7.6]]
and I want to:
For example, in the first 3 rows, 3 values in column 2 are 87
, 11
and 3
, respectively, and I would like to remain 11
and 3
.
The output numpy array I expected would be:
[[5.5, 11]
[22, 3]
[4, 9.8]
[32, 7.6]]
I am new to numpy array, and please give me advice to achieve this.
Upvotes: 0
Views: 302
Reputation: 879291
import numpy as np
x = np.array([[3.4, 87],
[5.5, 11],
[22, 3],
[4, 9.8],
[41, 11.22],
[32, 7.6]])
y = x.reshape(-1,3,2)
idx = y[..., 1].argmax(axis=1)
mask = np.arange(3)[None, :] != idx[:, None]
y = y[mask]
print(y)
# This might be helpful for the deleted part of your question
# y = y.reshape(-1,2,2)
# z = y[...,1]/y[...,1].sum(axis=1)
# result = np.dstack([y, z[...,None]])
yields
[[ 5.5 11. ]
[ 22. 3. ]
[ 4. 9.8]
[ 32. 7.6]]
"Grouping by three" with NumPy can be done by reshaping the array to create a new axis of length 3 -- provided the original number of rows is divisible by 3:
In [92]: y = x.reshape(-1,3,2); y
Out[92]:
array([[[ 3.4 , 87. ],
[ 5.5 , 11. ],
[ 22. , 3. ]],
[[ 4. , 9.8 ],
[ 41. , 11.22],
[ 32. , 7.6 ]]])
In [93]: y.shape
Out[93]: (2, 3, 2)
| | |
| | o--- 2 columns in each group
| o------ 3 rows in each group
o--------- 2 groups
For each group, we can select the second column and find the row with the maximum value:
In [94]: idx = y[..., 1].argmax(axis=1); idx
Out[94]: array([0, 1])
array([0, 1])
indicates that in the first group, the 0th indexed row contains the maximum (i.e. 87), and in the second group, the 1st indexed row contains the maximum (i.e. 11.22).
Next, we can generate a 2D boolean selection mask which is True where the rows do not contain the maximum value:
In [95]: mask = np.arange(3)[None, :] != idx[:, None]; mask
Out[95]:
array([[False, True, True],
[ True, False, True]], dtype=bool)
In [96]: mask.shape
Out[96]: (2, 3)
mask
has shape (2,3). y
has shape (2,3,2). If mask
is used to index y
as in y[mask]
, then the mask is aligned with the first two axes of y
, and all values where mask
is True
are returned:
In [98]: y[mask]
Out[98]:
array([[ 5.5, 11. ],
[ 22. , 3. ],
[ 4. , 9.8],
[ 32. , 7.6]])
In [99]: y[mask].shape
Out[99]: (4, 2)
By the way, the same calculation could be done using Pandas like this:
import numpy as np
import pandas as pd
x = np.array([[3.4, 87],
[5.5, 11],
[22, 3],
[4, 9.8],
[41, 11.22],
[32, 7.6]])
df = pd.DataFrame(x)
idx = df.groupby(df.index // 3)[1].idxmax()
# drop the row with the maximum value in each group
df = df.drop(idx.values, axis=0)
which yields the DataFrame:
0 1
1 5.5 11.0
2 22.0 3.0
3 4.0 9.8
5 32.0 7.6
You might find Pandas syntax easier to use, but for the above calculation NumPy is faster.
Upvotes: 1