python3 numpy averaging across duplicate timestamp values

Question

I am working with a large ordered list(100k+, ordererd by timestamps), but unfortunately it consists of sequence blocks like so:

....
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03

I would like to average across these duplicate timestamps and replace them on the list with just one pair like so:

...
2014-10-07T11:07:22.735Z,

where, in this case, the simply is 1.5250000000000000E+03

What would be the most efficient way to achieve this via python3 and numpy? Indeed, I can write a for loop, but I presume this is not the most efficient way of doing thngs.

CYC · Accepted Answer

Not sure what you want, do you want this?

import numpy as np

a = np.array([[  1, 1],[  1, 1],[  1, 1],
   [  2, 2],
   [  3, 3], [  3, 3], [  3, 3], [  3, 3],
   [  4, 4], [  4, 4]])
n = np.unique(a[:,0])
print(np.array([   [i, np.mean(a[a[:,0]==i,1])] for i in n]))

python3 numpy averaging across duplicate timestamp values

Answers (2)

Related Questions