Reputation: 4386
I have a data frame with a "group" variable, a "count" variable, and a "total" variable. For each group I want to sum the count column and divide that by the sum of the total column. How do I do this, ideally in one line of code?
Here is an example to work with:
test_dc = {1:{'group':'A','cnt':3,'total':5},
2:{'group':'B','cnt':1,'total':8},
3:{'group':'A','cnt':2,'total':4},
4:{'group':'B','cnt':6,'total':13}
}
test_df = pd.DataFrame.from_dict(test_dc, orient='index')
Expected output (roughly):
group | average
A | 0.55555
B | 0.33333
Edit: changed column name from "count" to "cnt" because there seems to be an existing count()
method on groupby objects.
Upvotes: 1
Views: 3486
Reputation: 294218
I'd use a combination of agg
and eval
test_df.groupby('group').agg('sum').eval('cnt / total')
group
A 0.555556
B 0.333333
dtype: float64
Upvotes: 2
Reputation: 3846
You can use DataFrame.groupby
to group by a column, and then call sum
on that to get the sums.
>>> df = test_df
.groupby('group')
.sum()
>>> df
count total
group
A 5 9
B 7 21
Then you can grab the column and divide them through to get your answer.
>>> df['count'] / df['total']
group
A 0.555556
B 0.333333
dtype: float64
You can do this in one line by taking advantage of the DataFrame.pipe
operator:
test_df
.groupby('group')
.sum()
.pipe(lambda df: df['count'] / df['total'])
Upvotes: 4