Reputation: 1481
I have a pair of equal length numpy arrays. dwells
contains float numbers representing dwell times, and ids
represents a state. In my example there are only 3 unique states labeled 0
, 1
, 2
.
dwells = np.array([4.3,0.2,3,1.5])
ids = np.array([2, 0, 1, 2])
Previous 2 arrays model a system that starts in state 2
, stays there for 4.3
seconds, jumps to state 0
, stays for 0.2
seconds and so on.
I would like to generate another numpy array. It needs as many columns as dwells.sum()
, each representing a whole number 0,1,2,3... indicating time. Each row matches one of the unique states (in this case 3). Each element of this array represents the relative contribution of each state during that time. For example, during the first 4 time points, only state 2 has any contribution, and thus the 1st 4 elements of row 2 are equal to 1
. The fifth column has contribution from all 3 states but the sum
equals 1
.
[[0, 0, 0, 0, 0.2, 0, 0, 0, 0]
[0, 0, 0, 0, 0.5, 1, 1, 0.5, 0]
[1, 1, 1, 1, 0.3, 0, 0, 0.5, 1]]
I could do that with a for
loop but I wonder if there is a more efficient vectorized way.
Upvotes: 2
Views: 143
Reputation: 5935
Assuming we have a smallest possible timestep of delta
:
import numpy as np
dwells = np.array([4.3,0.2,3,1.5])
ids = np.array([2, 0, 1, 2])
def dwell_map(dwells, ids, delta=0.1):
import numpy as np
import sys
idelta = 1 / delta
# ensure that idelta is an integer number
if not idelta.is_integer():
raise ValueError("1/delta is not integer")
idelta = int(idelta)
# create new longer dwells array
dwells_l = (dwells*idelta).astype(int)
# create target array
a = np.zeros((ids.max()+1, dwells_l.sum().astype(int)), dtype=int)
# create repeats of the ids according to the dwell time
ind = np.repeat(ids, dwells_l)
# put ones at the position where we have the indices
a[ind, np.arange(ind.size)] = 1
# reduce back to the original time resolution
a = a.reshape(ids.max()+1, -1, idelta).sum(axis=2)/idelta
return a
res = dwell_map(dwells, ids, 0.1)
This will only work well if delta is large enough, and the total duration is small enough, so that the intermediate arrays do not grow "infinitely" large.
The performance according to the iPython %timeit
magic for your example arrays, to compare it to your for-loop solution:
10000 loops, best of 5: 58.5 µs per loop
Upvotes: 2