Reputation: 477
I have this loop in order to calculate a value for same datetimes in a dataframe
for epoch in data_all['EPOCH'].unique():
data_epoch = data_all.query('EPOCH==@epoch')
data_epoch['SIGMA'] = pd.to_numeric(data_epoch['SIGMA'].values)
variance = np.mean(data_epoch['SIGMA'].values ** 2)
But that is very slow. Could you one way to do that faster?
Thank you
Upvotes: 1
Views: 61
Reputation: 150825
This is just groupby:
variances = data_all.groupby('EPOCH')['SIGMA'].var()
Or if you want to use your formular:
variances = (data_all['SIGMA']**2).groupby(data_all['EPOCH']).mean()
Update For your add-on question:
variances = data_all.groupby('EPOCH')['SIGMA'].transform('var')
data_all['GROUP'] = (variances<1).astype(int)
Upvotes: 1