Reputation: 25999
I haven't been able to find a solution online so I'm wondering if anyone has any ideas.
I have 3 columns, a numerical value('value) and 2 categorical columns('fruits' and 'country'). I want to calculate averages and sums of numerical columns for each categorical column combination.
Here's what I can do:
df = pd.DataFrame({'value':[34, 65, 65, 34, 53, 23, 54, 23, 65, 43, 23, 34, 54],'value1':[54, 45, 23, 43, 66, 12, 87, 34, 54, 23, 65, 34, 76], 'fruit': ['Plum', 'Apple', 'Orange', 'Pear', 'Grape', 'Apple', 'Orange', 'Banana', 'Orange', 'Grape', 'Apple', 'Banana', 'Tomato'], 'country': ['USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'CANADA', 'CANADA', 'CANADA', 'USA', 'USA', 'UK', 'USA']})[['value', 'fruit', 'country']]
# print(df)
print(df.groupby(['fruit']).mean()) #sum,etc work great.
value
fruit
Apple 37.000000
Banana 28.500000
Grape 48.000000
Orange 61.333333
Pear 34.000000
Plum 34.000000
Tomato 54.000000
It works for one column but what I'm trying to do is:
I'm confused at how to incorporate multiple columns in the groupby and do premutations.
Here's my failed attempts - any suggestions on how I can do this or letting me know what command can do this and I'll research it.
from itertools import permutations, combinations
perms = [perm for perm in permutations(df[['fruit', 'country']])]
print(perms) #doesn't work, only results in permutations of words fruit/country so 2 sets.
# [('fruit', 'country'), ('country', 'fruit')]
#trying to create combinations using combinations in itertools
g = df.groupby('fruit').apply(lambda x:
pd.Series(list(combinations(x.country, 2))))
df = pd.DataFrame(g.apply(list).tolist(), columns=['country', 'fruit'])
df
#not what I want either this is a combination I was looking for.
# country fruit
# 0 USA UK
# 1 USA USA
# 2 UK USA
# 3 CANADA UK
# 4 UK USA
# 5 USA CANADA
# 6 USA CANADA
# 7 CANADA CANADA
Upvotes: 1
Views: 73
Reputation: 34046
You want this:
Case 1: Groupby one column and Mean of both value
and value1
:
In [1018]: df.groupby('fruit').agg({'value': 'mean', 'value1': 'mean'}).reset_index()
Out[1018]:
fruit value value1
0 Apple 37.000000 40.666667
1 Banana 28.500000 34.000000
2 Grape 48.000000 44.500000
3 Orange 61.333333 54.666667
4 Pear 34.000000 43.000000
5 Plum 34.000000 54.000000
6 Tomato 54.000000 76.000000
Case 2: Groupby multiple columns and do calculations:
In [1017]: df.groupby(['fruit','country']).agg({'value': 'mean', 'value1': 'mean'}).reset_index()
Out[1017]:
fruit country value value1
0 Apple UK 23.0 12.0
1 Apple USA 44.0 55.0
2 Banana CANADA 23.0 34.0
3 Banana UK 34.0 34.0
4 Grape UK 53.0 66.0
5 Grape USA 43.0 23.0
6 Orange CANADA 59.5 70.5
7 Orange USA 65.0 23.0
8 Pear UK 34.0 43.0
9 Plum USA 34.0 54.0
10 Tomato USA 54.0 76.0
Update as per OP's comment:
In [1039]: res = df.groupby(['fruit','country']).agg({'value': 'mean', 'value1': 'mean'}).reset_index()
In [1041]: res['total'] = res.value + res.value1
In [1042]: res
Out[1042]:
fruit country value value1 total
0 Apple UK 23.0 12.0 35.0
1 Apple USA 44.0 55.0 99.0
2 Banana CANADA 23.0 34.0 57.0
3 Banana UK 34.0 34.0 68.0
4 Grape UK 53.0 66.0 119.0
5 Grape USA 43.0 23.0 66.0
6 Orange CANADA 59.5 70.5 130.0
7 Orange USA 65.0 23.0 88.0
8 Pear UK 34.0 43.0 77.0
9 Plum USA 34.0 54.0 88.0
10 Tomato USA 54.0 76.0 130.0
Upvotes: 1
Reputation: 2583
You can use agg function to do different things with your columns
df.groupby(['fruit','country']).agg({'col1':'mean','col2':'median','col3':'sum'})
Upvotes: 0
Reputation: 1023
You can include multiple columns in groupby by using a list as df.groupby([col1, col2])
. In your case, you could then do:
df = pd.DataFrame({'value':[34, 65, 65, 34, 53, 23, 54, 23, 65, 43, 23, 34, 54],'value1':[54, 45, 23, 43, 66, 12, 87, 34, 54, 23, 65, 34, 76], 'fruit': ['Plum', 'Apple', 'Orange', 'Pear', 'Grape', 'Apple', 'Orange', 'Banana', 'Orange', 'Grape', 'Apple', 'Banana', 'Tomato'], 'country': ['USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'CANADA', 'CANADA', 'CANADA', 'USA', 'USA', 'UK', 'USA']})
df.groupby(['fruit', 'country']).mean().mean(axis=1)
where the first .mean()
calculates the mean for each combination of fruit
and country
and the second .mean(axis=1)
calculates the mean between value
and value1
.
Output:
fruit country
Apple UK 17.5
USA 49.5
Banana CANADA 28.5
UK 34.0
Grape UK 59.5
USA 33.0
Orange CANADA 65.0
USA 44.0
Pear UK 38.5
Plum USA 44.0
Tomato USA 65.0
Upvotes: 1