Elham
Elham

Reputation: 867

group by within group by in pandas

Consider the following dataset:

min    5-min     a
0       0        800
0       0        801
1       0        802
1       0        803
1       0        804
2       0        805
2       0        805
2       0        810
3       0        801
3       0        802
3       0        803
4       0        804
4       0        805
5       1        806
5       1        800
5       1        890
6       1        890
6       1        880
6       1        800
7       1        804
7       1        806
8       1        801
9       1        800
9       1        900
10      1        770
10      1        803
10      1        811

I need to calculate std of a on each group based on the minute and then calculate the mean of the results values in each group of 5 min. I do not know how to find the border of 5 min, after calculation of std. How should I save the data to know which std belong to each group of 5 min?

data.groupby('minute').a.std()

I would appreciate of any help.

Upvotes: 3

Views: 5283

Answers (2)

Paula Livingstone
Paula Livingstone

Reputation: 1215

Taskos answer is great but I wasn't sure if you needed the data to be pushed back into the dataframe or not. Assuming what you want is to add the new columns in the parent after each groupby operation, Ive opted to do that for you as follows

import pandas as pd

df = your_df

# First we create the standard deviation column

def add_std(grp):
    grp['stdevs'] = grp['a'].std()
    return grp

df = df.groupby('min').apply(add_std)

# Next we create the 5 minute mean column

def add_meandev(grp):
    grp['meandev'] = grp['stdevs'].mean()
    return grp

print(df.groupby('5-min').apply(add_meandev))

This can be done more elegantly by chaining etc but I have opted to lay it out like this so that the underlying process is more visible to you.

The final output from this will look like the following:

    min  5-min    a     stdevs    meandev
0     0      0  800   0.707107   1.345283
1     0      0  801   0.707107   1.345283
2     1      0  802   1.000000   1.345283
3     1      0  803   1.000000   1.345283
4     1      0  804   1.000000   1.345283
5     2      0  805   2.886751   1.345283
6     2      0  805   2.886751   1.345283
7     2      0  810   2.886751   1.345283
8     3      0  801   1.000000   1.345283
9     3      0  802   1.000000   1.345283
10    3      0  803   1.000000   1.345283
11    4      0  804   0.707107   1.345283
12    4      0  805   0.707107   1.345283
13    5      1  806  50.318983  39.107147
14    5      1  800  50.318983  39.107147
15    5      1  890  50.318983  39.107147
16    6      1  890  49.328829  39.107147
17    6      1  880  49.328829  39.107147
18    6      1  800  49.328829  39.107147
19    7      1  804   1.414214  39.107147
20    7      1  806   1.414214  39.107147
21    8      1  801        NaN  39.107147
22    9      1  800  70.710678  39.107147
23    9      1  900  70.710678  39.107147
24   10      1  770  21.733231  39.107147
25   10      1  803  21.733231  39.107147
26   10      1  811  21.733231  39.107147

Upvotes: 3

Tasko Olevski
Tasko Olevski

Reputation: 476

Not 100% clear on what you are asking... but I think this is what you need:

data.groupby(['min','5-min']).std().groupby('5-min').mean()

This finds the standard deviation based on the 5-min column of the means calculated based on the 'min' column.

Upvotes: 2

Related Questions