Reputation: 113
I have a (n by i by j) - 3D numpy array:a_3d_array
(2 by 5 by 3)
array([[[1, 2, 3],
[1, 1, 1],
[2, 2, 2],
[0, 3, 3],
[0, 0, 4]],
[[1, 2, 3],
[2, 2, 2],
[3, 3, 3],
[0, 4, 4],
[0, 0, 5]]]).
For each column j in n, I want to extract the last 2 non-zero elements and calculate the mean, then put the results in a (n by j) array. What I currently do is using a for loop
import numpy as np
a_3d_array = np.array([[[1, 2, 3],
[1, 1, 1],
[2, 2, 2],
[0, 3, 3],
[0, 0, 4]],
[[1, 2, 3],
[2, 2, 2],
[3, 3, 3],
[0, 4, 4],
[0, 0, 5]]])
aveCol = np.zeros([2,3])
for n in range(2):
for j in range(3):
temp = a_3d_array[n,:,j]
nonzero_array = temp[np.nonzero(temp)]
aveCol[n, j] = np.mean(nonzero_array[-2:])
to get the desired results
print(aveCol)
[[1.5 2.5 3.5] [2.5 3.5 4.5]]
that works fine. But I wonder if there is any better Pythonic way of doing the same thing?
What I found the most similar to my problem is here. But I don't quite understand the answer explained in a slightly different context.
Upvotes: 0
Views: 121
Reputation: 25023
TL;DR As far as I can tell, Ann's answer is the fastest
Each m
is a n×i 2D array, next we take a r
ow of its transpose, i.e., the "column" on which to perform the computation — on this "column" we discard ALL the zeros, we sum the last two non zero elements and take the mean
In [17]: np.array([[sum(r[r!=0][-2:])/2 for r in m.T] for m in a])
Out[17]:
array([[1.5, 2.5, 3.5],
[2.5, 3.5, 4.5]])
Edit1
It looks like it's faster than your loop
In [19]: %%timeit
...: avg = np.zeros([2,3])
...: for n in range(2):
...: for j in range(3):
...: temp = a[n,:,j]
...: nz = temp[np.nonzero(temp)]
...: avg[n, j] = np.mean(nz[-2:])
95.1 µs ± 596 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [20]: %timeit np.array([[sum(r[r!=0][-2:])/2 for r in m.T] for m in a])
45.5 µs ± 394 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Edit2
In [22]: %timeit np.array([[np.mean(list(filter(None, a[n,:,j]))[-2:]) for j in range(3)] for n in range(2)])
145 µs ± 689 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Edit3
In [25]: %%timeit
...: i = np.indices(a.shape)
...: i[:, a == 0] = -1
...: i = np.sort(i, axis=2)
...: i = i[:, :, -2:, :]
...: a[tuple(i)].mean(axis=1)
64 µs ± 239 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Edit4 Breaking News Info
The culprit in Ann's answer is np.mean
!!
In [29]: %timeit np.array([[sum(list(filter(None, a[n,:,j]))[-2:])/2 for j in range(3)] for n in range(2)])
32.7 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Upvotes: 2
Reputation: 27567
You can use the filter
method to filter out the 0
s from the arrays.
Here is a list comprehension approach:
import numpy as np
a_3d_array = np.array([[[1, 2, 3],
[1, 1, 1],
[2, 2, 2],
[0, 3, 3],
[0, 0, 4]],
[[1, 2, 3],
[2, 2, 2],
[3, 3, 3],
[0, 4, 4],
[0, 0, 5]]])
aveCol = np.array([[np.mean(list(filter(None, a_3d_array[n,:,j]))[-2:]) for j in range(3)] for n in range(2)])
print(aveCol)
Output:
[[1.5 2.5 3.5]
[2.5 3.5 4.5]]
Note from @gboffi: For efficiency, use
aveCol = np.array([[sum([i for i in a_3d_array[n,:,j] if i][-2:])/2 for j in range(3)] for n in range(2)])
instead of
aveCol = np.array([[np.array([i for i in a_3d_array[n,:,j] if i][-2:]) for j in range(3)] for n in range(2)])
Upvotes: 0
Reputation: 36765
You can get the indices of your array a
, mark zero items by a negative number, sort, limit and then use the result as an index:
i = np.indices(a.shape)
i[:, a == 0] = -1
i = np.sort(i, axis=2)
i = i[:, :, -2:, :]
a[tuple(i)].mean(axis=1)
# array([[1.5, 2.5, 3.5],
# [2.5, 3.5, 4.5]])
Upvotes: 0