Python Pandas trying to get the mode with my other statitics

Question

I have csv data:

Class,  Code,   Vendor, State,  NumberOfDays
3,      123,    Name1,  NE,     12.58402778
1,      876,    Name2,  TX,     12.51041667
3,      123,    Name1,  NE,     2.354166667
1,      876,    Name2,  TX,     12.21111111
3,      456,    Name2,  NY,     6.346527778
2,      876,    Name1,  NY,     5.513194444
3,      123,    Name1,  NE,     5.38125
1,      876,    Name2,  TX,     5.409722222

And I have the follow code that gives me the mean, min, max and median of the NumberOfDays field based on groupings of the other fields

df = pd.read_csv(r'C:...input.csv')

d = {'mean':'AverageDays','min':'LowestNumberOfDays','max':'HighestNumberOfDays','median':'MedianDays'}
df = (df.groupby(['Class','Code','Vendor','State'])['NumberOfDays']
    .agg(['mean','min','max','median'])
    .rename(columns=d)
    .reset_index())
df.to_csv(r'C:...output.csv', index=False)

It does what I need perfectly:

Class,Code,Vendor,State,AverageDays,LowestNumberOfDays,HighestNumberOfDays,MedianDays
1,876,Name2,TX,10.043750000666666,5.409722222,12.51041667,12.21111111
2,876,Name1,NY,5.513194444,5.513194444,5.513194444,5.513194444
3,123,Name1,NE,6.773148149,2.354166667,12.58402778,5.38125
3,456,Name2,NY,6.346527778,6.346527778,6.346527778,6.346527778

When I add mode to the code:

d = {'mean':'AverageDays','min':'LowestNumberOfDays','max':'HighestNumberOfDays','median':'MedianDays','mode':'ModeDays'}
df = (df.groupby(['Class','Code','Vendor','State'])['NumberOfDays']
    .agg(['mean','min','max','median','mode'])

I get this errror:

AttributeError: Cannot access callable attribute 'mode' of 'SeriesGroupBy' objects, try using the 'apply' method

When I researched this I see that mode is not a natural function in pandas but my issue is I cannot figure out how to leverage the other ways to calculate mode, and incorporate it in this same step of the code since this step reduces my data set into condensed groups. So I cannot run it after and if I run it before I will need to apply the mode value repeatedly across all the rows of my data for each group. My data is already over 4 gig and 30 million records so I would like to avoid making it bigger.

Vaishali · Accepted Answer

You can use value_counts for mode

d = {'mean':'AverageDays','min':'LowestNumberOfDays','max':'HighestNumberOfDays','median':'MedianDays', '': 'ModeDays'}

df.groupby(['Class','Code','Vendor','State'])['NumberOfDays']\
    .agg(['mean','min','max','median', lambda x: x.value_counts().index[0]])\
    .rename(columns=d)\
    .reset_index()


    Class   Code    Vendor  State   AverageDays LowestNumberOfDays  HighestNumberOfDays MedianDays  ModeDays
0   1       876     Name2   TX      10.043750   5.409722    12.510417   12.211111   12.510417
1   2       876     Name1   NY      5.513194    5.513194    5.513194    5.513194    5.513194
2   3       123     Name1   NE      6.773148    2.354167    12.584028   5.381250    2.354167
3   3       456     Name2   NY      6.346528    6.346528    6.346528    6.346528    6.346528

Python Pandas trying to get the mode with my other statitics

Answers (2)

Related Questions