Reputation: 5863
I am working with a large ordered list(100k+, ordererd by timestamps), but unfortunately it consists of sequence blocks like so:
....
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
I would like to average across these duplicate timestamps and replace them on the list with just one pair like so:
...
2014-10-07T11:07:22.735Z, <the_mean_value_across_duplicate_timestamps>
where, in this case, the <the_mean_value_across_duplicate_timestamps>
simply is 1.5250000000000000E+03
What would be the most efficient way to achieve this via python3 and numpy? Indeed, I can write a for
loop, but I presume this is not the most efficient way of doing thngs.
Upvotes: 0
Views: 79
Reputation: 325
Not sure what you want, do you want this?
import numpy as np
a = np.array([[ 1, 1],[ 1, 1],[ 1, 1],
[ 2, 2],
[ 3, 3], [ 3, 3], [ 3, 3], [ 3, 3],
[ 4, 4], [ 4, 4]])
n = np.unique(a[:,0])
print(np.array([ [i, np.mean(a[a[:,0]==i,1])] for i in n]))
Upvotes: 1