Dany
Dany

Reputation: 63

Creating a percentage column based on the values in a column in python

I am trying to create a column that includes the percentages based on the values in other columns in python. For example, let's assume that we have the following dataset.

+------------------------------------+------------+
|              teachers              |  grades    |
+------------------------------------+------------+
| Teacher1                           |     1      |
| Teacher1                           |     2      |
| Teacher1                           |     0      |
| Teacher2                           |     1      |
| Teacher2                           |     2      |
| Teacher2                           |     0      |
| Teacher2                           |     2      |
| Teacher3                           |     2      |
| Teacher3                           |     0      |
| Teacher3                           |     1      |
| Teacher3                           |     0      |
| Teacher4                           |     0      |
| Teacher4                           |     0      |
+------------------------------------+------------+

As you can see we have teachers in the first columns. In the second column, we can see the grades that teachers give (0,1 and 2) to students. Here, I am trying to get the percentage of grade 1 and 2 in total given grade for each teacher. For instance, teacher 1 gave one grade 1, one grade 2, and one grade 0. In this case, the percentage of given grade numbers 1 and 2 in the total grade is 66%. So I would like to get the following data frame:

+------------------------------------+------------+------------+
|              teachers              |  grades    | percentage |
+------------------------------------+------------+------------+
| Teacher1                           |     1      |     66%    |
| Teacher1                           |     2      |     66%    |
| Teacher1                           |     0      |     66%    |
| Teacher2                           |     1      |     75%    |
| Teacher2                           |     2      |     75%    |
| Teacher2                           |     0      |     75%    |
| Teacher2                           |     2      |     75%    |
| Teacher3                           |     2      |     50%    |
| Teacher3                           |     0      |     50%    |
| Teacher3                           |     1      |     50%    |
| Teacher3                           |     0      |     50%    |
| Teacher4                           |     0      |     0%     |
| Teacher4                           |     0      |     0%     |
+------------------------------------+------------+------------+

I have tried the following method so far, but it did not work. Could you please help me?

percents = {} #store Teacher:percent
for t, g in df.groupby('teachers'):
    total = g.grades.sum()
    one_two = g.loc[g.grades.isin([1,2])].counts.sum() #consider only 1&2
    percent = (one_two/total)*100
    print(t, percent)
    percents[t] = [percent]

Upvotes: 0

Views: 317

Answers (1)

Marat
Marat

Reputation: 15738

Please avoid loops when working with numpy/pandas. Here is a vectorized version of it:

percentage = df.groupby('teachers').grades.transform(lambda x: sum(x > 0) / len(x))

The only difference here is .transform, which takes a function to process the group - you already had everything else in place.

Upvotes: 1

Related Questions