Lostsoul
Lostsoul

Reputation: 25999

Pandas - how can I groupby different columns to calculate averages for the groups?

I haven't been able to find a solution online so I'm wondering if anyone has any ideas.

I have 3 columns, a numerical value('value) and 2 categorical columns('fruits' and 'country'). I want to calculate averages and sums of numerical columns for each categorical column combination.

Here's what I can do:

df = pd.DataFrame({'value':[34, 65, 65, 34, 53, 23, 54, 23, 65, 43, 23, 34, 54],'value1':[54, 45, 23, 43, 66, 12, 87, 34, 54, 23, 65, 34, 76], 'fruit': ['Plum', 'Apple', 'Orange', 'Pear', 'Grape', 'Apple', 'Orange', 'Banana', 'Orange', 'Grape', 'Apple', 'Banana', 'Tomato'], 'country': ['USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'CANADA', 'CANADA', 'CANADA', 'USA', 'USA', 'UK', 'USA']})[['value', 'fruit', 'country']]
# print(df)

print(df.groupby(['fruit']).mean()) #sum,etc work great.
            value
fruit            
Apple   37.000000
Banana  28.500000
Grape   48.000000
Orange  61.333333
Pear    34.000000
Plum    34.000000
Tomato  54.000000

It works for one column but what I'm trying to do is:

  1. Mean of both value and value2(I want to understand how to do single or multiple calculations based on group by).
  2. I want to not just groupby one column, fruit, but I want to do a combination of fruit/country so average of value/value1 grapes in both UK, USA, CANADA.

I'm confused at how to incorporate multiple columns in the groupby and do premutations.

Here's my failed attempts - any suggestions on how I can do this or letting me know what command can do this and I'll research it.

from itertools import permutations, combinations

perms = [perm for perm in permutations(df[['fruit', 'country']])]
print(perms) #doesn't work, only results in permutations of words fruit/country so 2 sets.
# [('fruit', 'country'), ('country', 'fruit')]

#trying to create combinations using combinations in itertools
g = df.groupby('fruit').apply(lambda x: 
      pd.Series(list(combinations(x.country, 2))))

df = pd.DataFrame(g.apply(list).tolist(), columns=['country', 'fruit'])
df
#not what I want either this is a combination I was looking for.

# country   fruit
# 0 USA UK
# 1 USA USA
# 2 UK  USA
# 3 CANADA  UK
# 4 UK  USA
# 5 USA CANADA
# 6 USA CANADA
# 7 CANADA  CANADA

Upvotes: 1

Views: 73

Answers (3)

Mayank Porwal
Mayank Porwal

Reputation: 34046

You want this:

Case 1: Groupby one column and Mean of both value and value1:

   In [1018]: df.groupby('fruit').agg({'value': 'mean', 'value1': 'mean'}).reset_index()
Out[1018]: 
    fruit      value     value1
0   Apple  37.000000  40.666667
1  Banana  28.500000  34.000000
2   Grape  48.000000  44.500000
3  Orange  61.333333  54.666667
4    Pear  34.000000  43.000000
5    Plum  34.000000  54.000000
6  Tomato  54.000000  76.000000

Case 2: Groupby multiple columns and do calculations:

In [1017]: df.groupby(['fruit','country']).agg({'value': 'mean', 'value1': 'mean'}).reset_index()
Out[1017]: 
     fruit country  value  value1
0    Apple      UK   23.0    12.0
1    Apple     USA   44.0    55.0
2   Banana  CANADA   23.0    34.0
3   Banana      UK   34.0    34.0
4    Grape      UK   53.0    66.0
5    Grape     USA   43.0    23.0
6   Orange  CANADA   59.5    70.5
7   Orange     USA   65.0    23.0
8     Pear      UK   34.0    43.0
9     Plum     USA   34.0    54.0
10  Tomato     USA   54.0    76.0

Update as per OP's comment:

In [1039]: res =  df.groupby(['fruit','country']).agg({'value': 'mean', 'value1': 'mean'}).reset_index()

In [1041]: res['total'] = res.value + res.value1

In [1042]: res
Out[1042]: 
     fruit country  value  value1  total
0    Apple      UK   23.0    12.0   35.0
1    Apple     USA   44.0    55.0   99.0
2   Banana  CANADA   23.0    34.0   57.0
3   Banana      UK   34.0    34.0   68.0
4    Grape      UK   53.0    66.0  119.0
5    Grape     USA   43.0    23.0   66.0
6   Orange  CANADA   59.5    70.5  130.0
7   Orange     USA   65.0    23.0   88.0
8     Pear      UK   34.0    43.0   77.0
9     Plum     USA   34.0    54.0   88.0
10  Tomato     USA   54.0    76.0  130.0

Upvotes: 1

Mehdi Golzadeh
Mehdi Golzadeh

Reputation: 2583

You can use agg function to do different things with your columns

df.groupby(['fruit','country']).agg({'col1':'mean','col2':'median','col3':'sum'})

Upvotes: 0

Marcus
Marcus

Reputation: 1023

You can include multiple columns in groupby by using a list as df.groupby([col1, col2]). In your case, you could then do:

df = pd.DataFrame({'value':[34, 65, 65, 34, 53, 23, 54, 23, 65, 43, 23, 34, 54],'value1':[54, 45, 23, 43, 66, 12, 87, 34, 54, 23, 65, 34, 76], 'fruit': ['Plum', 'Apple', 'Orange', 'Pear', 'Grape', 'Apple', 'Orange', 'Banana', 'Orange', 'Grape', 'Apple', 'Banana', 'Tomato'], 'country': ['USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'CANADA', 'CANADA', 'CANADA', 'USA', 'USA', 'UK', 'USA']})

df.groupby(['fruit', 'country']).mean().mean(axis=1)

where the first .mean() calculates the mean for each combination of fruit and country and the second .mean(axis=1) calculates the mean between value and value1.

Output:

fruit   country
Apple   UK         17.5
        USA        49.5
Banana  CANADA     28.5
        UK         34.0
Grape   UK         59.5
        USA        33.0
Orange  CANADA     65.0
        USA        44.0
Pear    UK         38.5
Plum    USA        44.0
Tomato  USA        65.0

Upvotes: 1

Related Questions