Reputation: 63
I am trying to create a column that includes the percentages based on the values in other columns in python. For example, let's assume that we have the following dataset.
+------------------------------------+------------+
| teachers | grades |
+------------------------------------+------------+
| Teacher1 | 1 |
| Teacher1 | 2 |
| Teacher1 | 0 |
| Teacher2 | 1 |
| Teacher2 | 2 |
| Teacher2 | 0 |
| Teacher2 | 2 |
| Teacher3 | 2 |
| Teacher3 | 0 |
| Teacher3 | 1 |
| Teacher3 | 0 |
| Teacher4 | 0 |
| Teacher4 | 0 |
+------------------------------------+------------+
As you can see we have teachers in the first columns. In the second column, we can see the grades that teachers give (0,1 and 2) to students. Here, I am trying to get the percentage of grade 1 and 2 in total given grade for each teacher. For instance, teacher 1 gave one grade 1, one grade 2, and one grade 0. In this case, the percentage of given grade numbers 1 and 2 in the total grade is 66%. So I would like to get the following data frame:
+------------------------------------+------------+------------+
| teachers | grades | percentage |
+------------------------------------+------------+------------+
| Teacher1 | 1 | 66% |
| Teacher1 | 2 | 66% |
| Teacher1 | 0 | 66% |
| Teacher2 | 1 | 75% |
| Teacher2 | 2 | 75% |
| Teacher2 | 0 | 75% |
| Teacher2 | 2 | 75% |
| Teacher3 | 2 | 50% |
| Teacher3 | 0 | 50% |
| Teacher3 | 1 | 50% |
| Teacher3 | 0 | 50% |
| Teacher4 | 0 | 0% |
| Teacher4 | 0 | 0% |
+------------------------------------+------------+------------+
I have tried the following method so far, but it did not work. Could you please help me?
percents = {} #store Teacher:percent
for t, g in df.groupby('teachers'):
total = g.grades.sum()
one_two = g.loc[g.grades.isin([1,2])].counts.sum() #consider only 1&2
percent = (one_two/total)*100
print(t, percent)
percents[t] = [percent]
Upvotes: 0
Views: 317
Reputation: 15738
Please avoid loops when working with numpy/pandas. Here is a vectorized version of it:
percentage = df.groupby('teachers').grades.transform(lambda x: sum(x > 0) / len(x))
The only difference here is .transform
, which takes a function to process the group - you already had everything else in place.
Upvotes: 1