AJW
AJW

Reputation: 5863

python3 numpy averaging across duplicate timestamp values

I am working with a large ordered list(100k+, ordererd by timestamps), but unfortunately it consists of sequence blocks like so:

....
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03
2014-10-07T11:07:22.735Z, 1.5250000000000000E+03

I would like to average across these duplicate timestamps and replace them on the list with just one pair like so:

...
2014-10-07T11:07:22.735Z, <the_mean_value_across_duplicate_timestamps>

where, in this case, the <the_mean_value_across_duplicate_timestamps> simply is 1.5250000000000000E+03

What would be the most efficient way to achieve this via python3 and numpy? Indeed, I can write a for loop, but I presume this is not the most efficient way of doing thngs.

Upvotes: 0

Views: 79

Answers (2)

CYC
CYC

Reputation: 325

Not sure what you want, do you want this?

import numpy as np

a = np.array([[  1, 1],[  1, 1],[  1, 1],
   [  2, 2],
   [  3, 3], [  3, 3], [  3, 3], [  3, 3],
   [  4, 4], [  4, 4]])
n = np.unique(a[:,0])
print(np.array([   [i, np.mean(a[a[:,0]==i,1])] for i in n]))

Upvotes: 1

magraf
magraf

Reputation: 460

unforntunately, you did not state the column names, but I would recommend to use pandas. groupby. Afterwards calculate the mean of the values from the grouped time stamps.

df.groupby(by=['timestamp'], axis=1).mean()

Upvotes: 0

Related Questions