Reputation: 285
I have a dataset that has two columns, column 1 is the time which goes from 1 to 9 seconds and column 2 is the probability of an event in a specific second with values of 30, 69, 56, 70, 90, 59, 87, 10, 20.
I am trying to get the average probability in a time interval (after 2 seconds for this case), like the probability between 2 to 3 seconds, 2 to 4 seconds, 2 to 5 seconds,....2 to 9 seconds.
I tried the following approach where I defined a function t_inc
which has increments of 1 greater than 2. However, I am getting the following error msg (P_slice_avg_1
in the code):
Operands could not be broadcast together with shapes (9,) (7,)
because my t_inc has a shape of 7.
When I tried to do it in a manual way (P_slice_avg_2
in the code) it works but not feasible if I want to do it for a huge number of intervals.
Any help in how to generalize it would be greatly helpful.
import numpy as np
data=np.loadtxt('C:/Users/Hrihaan/Desktop/Sample.txt')
t=data[:,0] # t goes from 1 to 9
P=data[:,1] # probability of an event in a specific second
i= np.arange(1, 8 , 1)
t_inc= 2 + i
P_slice_avg_1= np.mean(P[(t>=2) & (t<=t_inc)]) # I thought this would give me the averages between 2 and values of t_inc
P_slice_avg_2= np.mean(P[(t>=2) & (t<=3)]), np.mean(P[(t>=2) & (t<=4)]), np.mean(P[(t>=2) & (t<=5)]), np.mean(P[(t>=2) & (t<=6)]), np.mean(P[(t>=2) & (t<=7)]), np.mean(P[(t>=2) & (t<=8)]), np.mean(P[(t>=2) & (t<=9)])
Upvotes: 1
Views: 110
Reputation: 6475
Here a vectorized approach exploiting numpy broadcasting:
import numpy as np
t = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
P = np.array([30, 69, 56, 70, 90, 59, 87, 10, 20], dtype=float)
i = np.arange(1, 8 , 1)
t_inc= 2 + i
T = np.tile(t[:,None], len(i))
P = np.tile(P[:,None], len(i))
np.tile constructs an array by repeating it the number of given times, in this case we will have len(i)
copies of t
and of P
, namely:
P
array([[30., 30., 30., 30., 30., 30., 30.],
[69., 69., 69., 69., 69., 69., 69.],
[56., 56., 56., 56., 56., 56., 56.],
[70., 70., 70., 70., 70., 70., 70.],
[90., 90., 90., 90., 90., 90., 90.],
[59., 59., 59., 59., 59., 59., 59.],
[87., 87., 87., 87., 87., 87., 87.],
[10., 10., 10., 10., 10., 10., 10.],
[20., 20., 20., 20., 20., 20., 20.]])
Now we set to zero all the elements not satisfying the required condition using np.logical_or:
P[np.logical_or(2>T, T>t_inc)]=0
P
array([[ 0., 0., 0., 0., 0., 0., 0.],
[69., 69., 69., 69., 69., 69., 69.],
[56., 56., 56., 56., 56., 56., 56.],
[ 0., 70., 70., 70., 70., 70., 70.],
[ 0., 0., 90., 90., 90., 90., 90.],
[ 0., 0., 0., 59., 59., 59., 59.],
[ 0., 0., 0., 0., 87., 87., 87.],
[ 0., 0., 0., 0., 0., 10., 10.],
[ 0., 0., 0., 0., 0., 0., 20.]])
In this way we are storing in each column exactly the elements to average, however using np.mean
would yield the wrong result since the denominator would be P.shape[0]
, i.e. counting also the zero-ed elements. As a workaround we can sum along the axis and divide by the total count of non-zero elements using np.count_nonzero
:
np.sum(P, axis=0)/np.count_nonzero(P, axis=0)
array([62.5, 65., 71.25, 68.8, 71.83333333, 63., 57.625])
Upvotes: 1