Reputation: 3713
Given the following data frame:
DF = pd.DataFrame({'Site': ['A', 'A', 'A', 'A', 'B', 'B','B','B'],
'Score': [1, -1, -0.5, 1, 0, -1, 2, 4],
'Group': [1, 1, 2, 2, 1, 1, 2, 2]})
DF
Group Score Site
0 1 1.0 A
1 1 -1.0 A
2 2 -0.5 A
3 2 1.0 A
4 1 0.0 B
5 1 -1.0 B
6 2 2.0 B
7 2 4.0 B
I'd like to have pandas add a column that shows the percent of rows per site that have a score at or above 0 (i.e. 3 of 4 rows in site B are at or above zero, so the result is 75%) and another column that shows the percent by group within each site (i.e. Group 1 in site A has 1 score out of 2 that are at or above zero, so the result is 50%). The desired result is as follows:
Group Score Site Site% SiteGroup%
0 1 1.0 A 0.5 0.5
1 1 -1.0 A 0.5 0.5
2 2 -0.5 A 0.5 0.5
3 2 1.0 A 0.5 0.5
4 1 0.0 B 0.75 0.5
5 1 -1.0 B 0.75 0.5
6 2 2.0 B 0.75 1
7 2 4.0 B 0.75 1
Thanks in advance!
Upvotes: 1
Views: 398
Reputation: 42875
You could try:
df['score_indicator'] = df.Score.apply(lambda x: 1 if x >=0 else 0)
df['Site%'] = df.groupby('Site')['score_indicator'].transform(lambda x: x.sum() / x.count())
df['Group%'] = df.groupby(['Site','Group'])['score_indicator'].transform(lambda x: x.sum() / x.count())
to get
print(df)
Group Score Site score_indicator Site% Group%
0 1 1.0 A 1 0.50 0.50
1 1 -1.0 A 0 0.50 0.50
2 2 -0.5 A 0 0.50 0.75
3 2 1.0 A 1 0.50 0.75
4 1 0.0 B 1 0.75 0.50
5 1 -1.0 B 0 0.75 0.50
6 2 2.0 B 1 0.75 0.75
7 2 4.0 B 1 0.75 0.75
Upvotes: 1