Reputation: 1466
I have a dataframe as the following:
Member Category Total
1001 1 5
1001 2 4
1001 3 9
1003 1 7
1003 2 5
1003 3 2
1005 1 2
1005 3 5
I need to get:
Member Category Total Average
1001 1 5 0.27
1001 2 4 0.22
1001 3 9 0.5
1003 1 7 0.5
1003 2 5 0.35
1003 3 2 0.15
1005 1 2 0.28
1005 3 5 0.72
That is, the average of totals for each member. For instance, the member 1001 has a total of 18, in which the category 1 represents 27% of the total. Therefore the average would be 0.27.
What I tried was:
average = []
for member in df[df["Member"].unique():
total_member = df[df["Member"] == member]["Total"].sum()
for category in df["Category"].unique():
total_category = df[(df["Member"]==member) & (df["Category"]==category)]["Total"]
average.append(total_category/total_member)
df["Average"] = average
However, not only does it not work, but since I have a very large amount of data, it is too slow.
Upvotes: 1
Views: 202
Reputation: 863471
Use transform
for sum
for divide by column Total
:
df['Average'] = df['Total'] / df.groupby('Member')['Total'].transform('sum')
print (df)
Member Category Total Average
0 1001 1 5 0.277778
1 1001 2 4 0.222222
2 1001 3 9 0.500000
3 1003 1 7 0.500000
4 1003 2 5 0.357143
5 1003 3 2 0.142857
6 1005 1 2 0.285714
7 1005 3 5 0.714286
Detail:
print (df.groupby('Member')['Total'].transform('sum'))
0 18
1 18
2 18
3 14
4 14
5 14
6 7
7 7
Name: Total, dtype: int64
Alternative solution:
df['Average'] = df['Total'] / df['Member'].map(df.groupby('Member')['Total'].sum())
Timings:
np.random.seed(123)
N = 100000
L = ['AV','DF','SD','RF','F','WW','FG','SX']
dates = pd.date_range('2015-01-01', '2015-02-20')
df = pd.DataFrame(np.random.randint(100, size=(N, 3)), columns=['Member','Category','Total'])
df = df.sort_values(['Member','Category']).reset_index(drop=True)
#Wen solution
In [395]: %timeit df.groupby('Member').Total.apply(lambda x : x/sum(x))
10 loops, best of 3: 31.2 ms per loop
In [396]: %timeit df['Total'] / df.groupby('Member')['Total'].transform('sum')
100 loops, best of 3: 5.11 ms per loop
#alternative a bit slowier solution
In [397]: %timeit df['Total'] / df['Member'].map(df.groupby('Member')['Total'].sum())
100 loops, best of 3: 9.92 ms per loop
Upvotes: 2
Reputation: 323366
df['ave']=df.groupby('Member').Total.apply(lambda x : x/sum(x))
df
Out[318]:
Member Category Total ave
0 1001 1 5 0.277778
1 1001 2 4 0.222222
2 1001 3 9 0.500000
3 1003 1 7 0.500000
4 1003 2 5 0.357143
5 1003 3 2 0.142857
6 1005 1 2 0.285714
7 1005 3 5 0.714286
Upvotes: 1