Reputation: 37
simple example:
a = array([[[1, 0, 0],
[0, 2, 0],
[0, 0, 3]],
[[1, 0, 0],
[0, 1, 0],
[0, 0, 1]]])
result = []
for i in a:
result.append(i.sum())
result = [6, 3]
Is there a numpy function doing this faster? If it helps: a contains only diagonal matrices.
Edit: I just realized that a contains scipy csc_sparse matrices, i.e. its a numpy 1D array containing matrices and i can not apply the sum function with axis=(1, 2)
Upvotes: 0
Views: 181
Reputation: 26886
A proper use of the axis
parameter of np.sum()
would do:
import numpy as np
np.sum(a, axis=(1, 2))
# [6, 3]
While the above should be generic preferred method, if your input is actually diagonal over axis 1 and 2, then summing all the zeros is bound to be inefficient (read O(n² k)
with same n
and k
as the gen_a()
function below). Using np.sum()
after np.diag()
inside a loop can be much better (read O(n k)
with same n
and k
as before). Possibly, using a list
comprehension is the way to go:
import numpy as np
np.array([np.sum(np.diag(x)) for x in a])
# [3, 6]
To give some idea of the relative speed, let's write a function to generate inputs of arbitrary size:
def gen_a(n, k):
return np.array([
np.diag(np.ones(n, dtype=int))
if i % 2 else
np.diag(np.arange(1, n + 1, dtype=int))
for i in range(k)])
print(gen_a(3, 2))
# [[[1 0 0]
# [0 2 0]
# [0 0 3]]
# [[1 0 0]
# [0 1 0]
# [0 0 1]]]
Now, we can time for different input size. I have also included a list
comprehension without the np.diag()
call, which is fundamentally a slightly more concise version of your approach.
a = gen_a(3, 2)
%timeit np.array([np.sum(np.diag(x)) for x in a])
# 100000 loops, best of 3: 16 µs per loop
%timeit np.sum(a, axis=(1, 2))
# 100000 loops, best of 3: 4.51 µs per loop
%timeit np.array([np.sum(x) for x in a])
# 100000 loops, best of 3: 10 µs per loop
a = gen_a(3000, 2)
%timeit np.array([np.sum(np.diag(x)) for x in a])
# 10000 loops, best of 3: 20.5 µs per loop
%timeit np.sum(a, axis=(1, 2))
# 100 loops, best of 3: 17.8 ms per loop
%timeit np.array([np.sum(x) for x in a])
# 100 loops, best of 3: 17.8 ms per loop
a = gen_a(3, 2000)
%timeit np.array([np.sum(np.diag(x)) for x in a])
# 100 loops, best of 3: 14.8 ms per loop
%timeit np.sum(a, axis=(1, 2))
# 10000 loops, best of 3: 34 µs per loop
%timeit np.array([np.sum(x) for x in a])
# 100 loops, best of 3: 8.93 ms per loop
a = gen_a(300, 200)
%timeit np.array([np.sum(np.diag(x)) for x in a])
# 1000 loops, best of 3: 1.67 ms per loop
%timeit np.sum(a, axis=(1, 2))
# 100 loops, best of 3: 17.8 ms per loop
%timeit np.array([np.sum(x) for x in a])
# 100 loops, best of 3: 19.3 ms per loop
And we observe that depending on the value of n
and k
one or the other solution gets faster.
For larger n
, the list comprehension gets faster, but only if np.diag()
is used.
On the contrary, for smaller n
and larger k
, np.sum()
raw speed can outperform the explicit looping.
Upvotes: 4