John Minze
John Minze

Reputation: 127

Python Pandas trying to get the mode with my other statitics

I have csv data:

Class,  Code,   Vendor, State,  NumberOfDays
3,      123,    Name1,  NE,     12.58402778
1,      876,    Name2,  TX,     12.51041667
3,      123,    Name1,  NE,     2.354166667
1,      876,    Name2,  TX,     12.21111111
3,      456,    Name2,  NY,     6.346527778
2,      876,    Name1,  NY,     5.513194444
3,      123,    Name1,  NE,     5.38125
1,      876,    Name2,  TX,     5.409722222

And I have the follow code that gives me the mean, min, max and median of the NumberOfDays field based on groupings of the other fields

df = pd.read_csv(r'C:...input.csv')

d = {'mean':'AverageDays','min':'LowestNumberOfDays','max':'HighestNumberOfDays','median':'MedianDays'}
df = (df.groupby(['Class','Code','Vendor','State'])['NumberOfDays']
    .agg(['mean','min','max','median'])
    .rename(columns=d)
    .reset_index())
df.to_csv(r'C:...output.csv', index=False)

It does what I need perfectly:

Class,Code,Vendor,State,AverageDays,LowestNumberOfDays,HighestNumberOfDays,MedianDays
1,876,Name2,TX,10.043750000666666,5.409722222,12.51041667,12.21111111
2,876,Name1,NY,5.513194444,5.513194444,5.513194444,5.513194444
3,123,Name1,NE,6.773148149,2.354166667,12.58402778,5.38125
3,456,Name2,NY,6.346527778,6.346527778,6.346527778,6.346527778

When I add mode to the code:

d = {'mean':'AverageDays','min':'LowestNumberOfDays','max':'HighestNumberOfDays','median':'MedianDays','mode':'ModeDays'}
df = (df.groupby(['Class','Code','Vendor','State'])['NumberOfDays']
    .agg(['mean','min','max','median','mode'])

I get this errror:

AttributeError: Cannot access callable attribute 'mode' of 'SeriesGroupBy' objects, try using the 'apply' method

When I researched this I see that mode is not a natural function in pandas but my issue is I cannot figure out how to leverage the other ways to calculate mode, and incorporate it in this same step of the code since this step reduces my data set into condensed groups. So I cannot run it after and if I run it before I will need to apply the mode value repeatedly across all the rows of my data for each group. My data is already over 4 gig and 30 million records so I would like to avoid making it bigger.

Upvotes: 1

Views: 1424

Answers (2)

BENY
BENY

Reputation: 323226

You still can do that lambda x: x.mode()[0]

df.groupby(['Class','  Code','   Vendor',' State'])['  NumberOfDays'].agg(['mean','min','max','median',lambda x: x.mode()[0]])
Out[8]: 
                                    mean       min        max     median  \
Class   Code    Vendor  State                                              
1     876        Name2   TX    10.043750  5.409722  12.510417  12.211111   
2     876        Name1   NY     5.513194  5.513194   5.513194   5.513194   
3     123        Name1   NE     6.773148  2.354167  12.584028   5.381250   
      456        Name2   NY     6.346528  6.346528   6.346528   6.346528   
                               <lambda>  
Class   Code    Vendor  State            
1     876        Name2   TX    5.409722  
2     876        Name1   NY    5.513194  
3     123        Name1   NE    2.354167  
      456        Name2   NY    6.346528  

Upvotes: 1

Vaishali
Vaishali

Reputation: 38415

You can use value_counts for mode

d = {'mean':'AverageDays','min':'LowestNumberOfDays','max':'HighestNumberOfDays','median':'MedianDays', '<lambda>': 'ModeDays'}

df.groupby(['Class','Code','Vendor','State'])['NumberOfDays']\
    .agg(['mean','min','max','median', lambda x: x.value_counts().index[0]])\
    .rename(columns=d)\
    .reset_index()


    Class   Code    Vendor  State   AverageDays LowestNumberOfDays  HighestNumberOfDays MedianDays  ModeDays
0   1       876     Name2   TX      10.043750   5.409722    12.510417   12.211111   12.510417
1   2       876     Name1   NY      5.513194    5.513194    5.513194    5.513194    5.513194
2   3       123     Name1   NE      6.773148    2.354167    12.584028   5.381250    2.354167
3   3       456     Name2   NY      6.346528    6.346528    6.346528    6.346528    6.346528

Upvotes: 2

Related Questions