Andrei
Andrei

Reputation: 133

Pandas - Iterate over a distinct value of a column and get mean()/std()

Please assist me with the following case.

I have a Pandas DataFrame that looks something like this:

DATE; NAME; COUNT1; COUNT2; COUNT3;
01; Name1; 14; 3; 6;
02; Name1; 17; 4; 2;
03; Name1; 11; 1; 9;
01; Name2; 14; 3; 6;
02; Name2; 17; 4; 2;
03; Name2; 11; 1; 9;
01; Name3; 14; 3; 6;
02; Name3; 11; 1; 9;
03; Name3; 17; 4; 2;

I need to get the mean() and std() values for every name (COUNT3, for example).

Here is how the result should look:

Name1 std(): value
Name1 mean(): value
Name2 std(): value
Name2 mean(): value
Name3 std(): value
Name3 mean(): value

Thanks in advance!

Upvotes: 1

Views: 499

Answers (1)

jezrael
jezrael

Reputation: 863301

I think you need groupby and aggregate by agg:

df1 = df.groupby('NAME')['COUNT3'].agg(['mean','std'])
print (df1)
           mean       std
NAME                     
Name1  5.666667  3.511885
Name2  5.666667  3.511885
Name3  5.666667  3.511885

If you want change format of output add unstack:

df1 = df.groupby('NAME')['COUNT3']
        .agg(['mean','std'])
        .unstack()
        .swaplevel(0,1)
        .rename_axis(('names','aggreg'))
        .reset_index(name='val')
print (df1)
   names aggreg       val
0  Name1   mean  5.666667
1  Name2   mean  5.666667
2  Name3   mean  5.666667
3  Name1    std  3.511885
4  Name2    std  3.511885
5  Name3    std  3.511885

And for more columns together:

df2 = df.groupby('NAME')['COUNT1','COUNT2','COUNT3'].agg(['mean','std'])
print (df2)
      COUNT1         COUNT2              COUNT3          
        mean  std      mean       std      mean       std
NAME                                                     
Name1     14  3.0  2.666667  1.527525  5.666667  3.511885
Name2     14  3.0  2.666667  1.527525  5.666667  3.511885
Name3     14  3.0  2.666667  1.527525  5.666667  3.511885

df2 = df.groupby('NAME')['COUNT1','COUNT2','COUNT3']
        .agg(['mean','std'])
        .unstack()
        .reorder_levels((2,1,0))
        .rename_axis(('names','aggreg','columns'))
        .reset_index(name='val')
print (df2)
    names aggreg columns        val
0   Name1   mean  COUNT1  14.000000
1   Name2   mean  COUNT1  14.000000
2   Name3   mean  COUNT1  14.000000
3   Name1    std  COUNT1   3.000000
4   Name2    std  COUNT1   3.000000
5   Name3    std  COUNT1   3.000000
6   Name1   mean  COUNT2   2.666667
7   Name2   mean  COUNT2   2.666667
8   Name3   mean  COUNT2   2.666667
9   Name1    std  COUNT2   1.527525
10  Name2    std  COUNT2   1.527525
11  Name3    std  COUNT2   1.527525
12  Name1   mean  COUNT3   5.666667
13  Name2   mean  COUNT3   5.666667
14  Name3   mean  COUNT3   5.666667
15  Name1    std  COUNT3   3.511885
16  Name2    std  COUNT3   3.511885
17  Name3    std  COUNT3   3.511885

Upvotes: 2

Related Questions