Reputation: 15798
I have two arrays P and T. P[i] is a number, whose time stamp is T[i]; There might be duplicated time stamps.
I want to produce another two arrays Q and U, where Q[i] has time stamp U[i], and Q[i] is the sum of all elements in P that have time stamp U[i];
For example, for
P = [1, 2, 3, 4, 5] T = [0, 0, 1, 1, 1]
I will produce
Q = [3, 12] U = [0, 1];
Is there a fast way of doing this in numpy, that hopefully vectorizes it?
Upvotes: 3
Views: 876
Reputation: 879133
Using numpy 1.4 or better:
import numpy as np
P = np.array([1, 2, 3, 4, 5])
T = np.array([0, 0, 1, 1, 1])
U,inverse = np.unique(T,return_inverse=True)
Q = np.bincount(inverse,weights=P)
print (Q, U)
# (array([ 3., 12.]), array([0, 1]))
Please note that this is not the fastest solution. I tested the speed this way:
import numpy as np
N = 1000
P = np.repeat(np.array([1, 2, 3, 4, 5]),N)
T = np.repeat(np.array([0, 0, 1, 1, 1]),N)
def using_bincount():
U,inverse = np.unique(T,return_inverse=True)
Q = np.bincount(inverse,weights=P)
return Q,U
# (array([ 3., 12.]), array([0, 1]))
def using_lc():
U = list(set(T))
Q = [sum([p for (p,t) in zip(P,T) if t == u]) for u in U]
return Q,U
def using_slice():
U = np.unique(T)
Q = np.array([P[T == u].sum() for u in U])
return Q,U
For small arrays, wim's solution is faster (N=1):
% python -mtimeit -s'import test' 'test.using_lc()'
100000 loops, best of 3: 18.4 usec per loop
% python -mtimeit -s'import test' 'test.using_slice()'
10000 loops, best of 3: 66.8 usec per loop
% python -mtimeit -s'import test' 'test.using_bincount()'
10000 loops, best of 3: 52.8 usec per loop
For large arrays, joris's solution is faster (N=1000):
% python -mtimeit -s'import test' 'test.using_lc()'
100 loops, best of 3: 9.93 msec per loop
% python -mtimeit -s'import test' 'test.using_slice()'
1000 loops, best of 3: 390 usec per loop
% python -mtimeit -s'import test' 'test.using_bincount()'
1000 loops, best of 3: 846 usec per loop
I doubt it matters in this case, but benchmarks can change depending on version of numpy, python, OS, or hardware. It would not hurt to repeat these benchmarks on your machine.
Upvotes: 4
Reputation: 139142
import numpy as np
P = np.array([1, 2, 3, 4, 5])
T = np.array([0, 0, 1, 1, 1])
U = np.unique(T)
Q = np.array([P[T == u].sum() for u in U])
gives
In [17]: print Q, U
[3 12] [0 1]
Not really vectorized, but faster than the solution with lists.
If you want more powerful of this kind of group-by functions, maybe you can take a look at pandas.
Upvotes: 2
Reputation: 362507
>>> P = [1, 2, 3, 4, 5]; T = [0, 0, 1, 1, 1]
>>> U = list(set(T))
>>> Q = [sum([p for (p,t) in zip(P,T) if t == u]) for u in U]
>>> print Q, U
[3, 12] [0, 1]
Upvotes: 1