Reputation: 7
I have a dataframe like this:
df = pd.DataFrame([[1,'aaa',50],[0,'aaa',1000],[0,'aba',30],[1,'aaa',50],[1,'aba',10]],
columns=['A','B','C'])
df
A B C
0 1 aaa 50
1 0 aaa 1000
2 0 aba 30
3 1 aaa 50
4 1 aba 10
I want for each item in 'B'(which also there are repeated items), check its value in 'A'. If it's 1, it should calculate the sum of values in 'C' for that item. If it's 0, it should count the number of items which their 'A' value is zero. Then the final result would be: sum/count.
In the end, I want to show the result like this:
ID Value
0 aaa 100
1 aba 10
For example, 'aaa' has two 1 which their sum is 50 + 50 = 100, and one 0 which its count is 1. So the result is 100 / 1 = 100.
How can I do something like that in an efficient way? I tried to use groupby() and have the sum and count in different dataframes, but I don't know how to compare them and get this result.
Upvotes: 0
Views: 442
Reputation: 35686
Try groupby aggregate
on columns A
and B
, while summing and sizing the C
column. Then divide A==1
'sum' by A==0
'count':
new_df = df.groupby(['A', 'B']).aggregate(sum=('C', 'sum'), count=('C', 'size'))
new_df = (new_df.loc[1, 'sum'] / new_df.loc[0, 'count']).reset_index()
new_df.columns = ['ID', 'Value'] # Rename Columns
new_df
:
ID Value
0 aaa 100.0
1 aba 10.0
*Beware division by 0. It is possible some group would have 0 entries for a given B value.
Upvotes: 1
Reputation: 2647
In [90]: df[df['A'] == 1].groupby('B')['C'].sum() / df[df['A'] == 0].groupby('B').size()
Out[90]:
B
aaa 100.0
aba 10.0
dtype: float64
this should take care of dividing correctly as both the series are indexed by the column 'B'
because of the grouping
Upvotes: 1
Reputation: 9207
You can do a groupy and select the right group:
import pandas as pd
df_grouped = df.groupby(['A', 'B']).sum().loc[1]
B C
aaa 100
aba 10
Upvotes: 0