Matias Salimbene
Matias Salimbene

Reputation: 605

Performing operations on data and adding column to groupby results

I'm trying to write a simple program to track my attandance to my two university courses this semester. I need to have at least an 80% of attendance to each class of each subject. Each subject has three classes (T,TP and P).

After tinkering for a while I've come to find the groupby pandas function:

import pandas as pd
import sys
df = pd.read_csv(sys.argv[1])
g = df.groupby(['Subject','Class']).count()
print(g)

The source file has a row for each class where I'll note "Presente" when I go. I will also note "Sin Clases" when there is no class due to some external reason. This is what top 5 rows of the source file looks like:

Date          Day   Subject Class  Attendance          Comments
0  6/8/2018    2  Medieval     T    Presente               NaN
1  6/8/2018    2    Logica     T    Presente               NaN
2  6/8/2018    2    Logica     P         NaN               NaN
3  8/8/2018    4    Logica    TP  Sin clases  Movilización IVE
4  8/8/2018    4    Logica     P         NaN  Movilización IVE

This is the current output. What I'm not able to figure is how to perform some kind of operation and add a column. For example, adding a column counting only those rows with value "Presente" for each subgroup. I find myself complicating things too much, i'm afrading i'm missing something.

z:\devs\py\asist>python asistb.py asist2q.csv
                 Date  Day  Attendance  Comments
Subject  Class
Logica   P        32   32           3         4
         T        16   16           4         3
         TP       16   16           1         1
Medieval P        16   16           0         0
         T        32   32           5         4
         TP       16   16           1         1

A tip or hint on how to approach the problem is greatly appreciated. Thanks a lot!

Upvotes: 1

Views: 72

Answers (2)

Raja Sattiraju
Raja Sattiraju

Reputation: 1272

Given the example df as follows

Date    Day     Subject         Class   Attendance  Comments
0   6/8/2018    2   Medieval    T       Presente    NaN
1   6/8/2018    2   Logica      T       Presente    NaN
2   6/8/2018    2   Logica      P       NaN         NaN
3   8/8/2018    4   Logica      TP      Sinclases   MovilizaciónIVE
4   8/8/2018    4   Logica      P       NaN         MovilizaciónIVE

In order to count the occurances of Presente from the Attendance column, you use a transform method with string search and count operation and add the result in a new column as follows

df['counts'] = df.groupby(['Subject', 'Class'])['Attendance'].transform(
    lambda x: x[x.str.contains('Presente', na=False)].count())

This results the following output

    Date    Day     Subject     Class   Attendance  Comments         counts
0   6/8/2018    2   Medieval    T       Presente    NaN              1
1   6/8/2018    2   Logica      T       Presente    NaN              1
2   6/8/2018    2   Logica      P       NaN         NaN              0
3   8/8/2018    4   Logica      TP      Sinclases   MovilizaciónIVE  0
4   8/8/2018    4   Logica      P       NaN         MovilizaciónIVE  0

To calculate the total number of Presente per Subject, you can use groupby again and sum up the counts column

df.groupby(['Subject'])['counts'].sum()

The output is

Subject
Logica      1
Medieval    1
Name: counts, dtype: int32

Source:pandas groupby count string occurrence over column

Upvotes: 1

BENY
BENY

Reputation: 323396

Seems like you need unstack + stack

df.groupby(['Subject','Class']).count().unstack(fill_value=0).stack()
Out[296]: 
                Date  Day  Attendance  Comments
Subject  Class                                 
Logica   P         2    2           0         1
         T         1    1           1         0
         TP        1    1           1         1
Medieval P         0    0           0         0
         T         1    1           1         0
         TP        0    0           0         0

Upvotes: 3

Related Questions