jatorna
jatorna

Reputation: 477

Optimize pandas loop

I have this loop in order to calculate a value for same datetimes in a dataframe

   for epoch in data_all['EPOCH'].unique():
    data_epoch = data_all.query('EPOCH==@epoch')
    data_epoch['SIGMA'] = pd.to_numeric(data_epoch['SIGMA'].values)
    variance = np.mean(data_epoch['SIGMA'].values ** 2)

But that is very slow. Could you one way to do that faster?

Thank you

Upvotes: 1

Views: 61

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150825

This is just groupby:

variances = data_all.groupby('EPOCH')['SIGMA'].var()

Or if you want to use your formular:

variances = (data_all['SIGMA']**2).groupby(data_all['EPOCH']).mean()

Update For your add-on question:

variances = data_all.groupby('EPOCH')['SIGMA'].transform('var')
data_all['GROUP'] = (variances<1).astype(int)

Upvotes: 1

Related Questions