Rami.K
Rami.K

Reputation: 199

Average of Dataframe columns

I want to get the average GDP of each country across the years, the columns 2006, 2007...2015 contain the GDP numbers... My code returns an error that mean(axis=1) needs at least 1 variable, and 1 has been assign to it... which is weird..I also find it weird that we are using mean instead of avg, but coulnd't find an avg function for groupby

here is my code

    Top15 = ANSWER
    Top15 = Top15[['Country', '2006', '2007', '2008', '2009', '2010', 
    '2011', '2012', '2013', '2014', '2015']]
    return Top15.groupby('Country').agg({"avg": np.mean(axis=1)})

Upvotes: 2

Views: 8358

Answers (3)

Marco Neumann
Marco Neumann

Reputation: 688

There are multiple problems with your code:

  1. .agg with a dict maps input columns to aggregation type, like .agg({'2016': 'mean'})
  2. np.mean(axis=1) tries to evaluate something, but you did not provide an input. .agg({'2016': lambda x: np.mean(x)}) would work
  3. the easiest way would be Top15.groupby('Country').mean() (read it as "group by Country and for each group calculate the mean (avg)")

Upvotes: 0

ItayBenHaim
ItayBenHaim

Reputation: 183

Use mean()

Top15 = ANSWER
Top15 = Top15[['Country', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015']]    
return Top15.groupby('Country').mean()

Upvotes: 0

jpp
jpp

Reputation: 164623

GroupBy is not necessary here as you are performing a calculation rather than an aggregation. You can just use pd.DataFrame.mean. Here's a minimal example:

df = pd.DataFrame({'Country': ['UK', 'US'],
                   '2006': [1, 2],
                   '2007': [3, 4],
                   '2008': [5, 6]})

df['mean'] = df[['2006', '2007', '2008']].mean(1)

print(df)

   2006  2007  2008 Country  mean
0     1     3     5      UK   3.0
1     2     4     6      US   4.0

Upvotes: 5

Related Questions