Reputation: 605
I'm trying to write a simple program to track my attandance to my two university courses this semester. I need to have at least an 80% of attendance to each class of each subject. Each subject has three classes (T,TP and P).
After tinkering for a while I've come to find the groupby pandas function:
import pandas as pd
import sys
df = pd.read_csv(sys.argv[1])
g = df.groupby(['Subject','Class']).count()
print(g)
The source file has a row for each class where I'll note "Presente" when I go. I will also note "Sin Clases" when there is no class due to some external reason. This is what top 5 rows of the source file looks like:
Date Day Subject Class Attendance Comments
0 6/8/2018 2 Medieval T Presente NaN
1 6/8/2018 2 Logica T Presente NaN
2 6/8/2018 2 Logica P NaN NaN
3 8/8/2018 4 Logica TP Sin clases Movilización IVE
4 8/8/2018 4 Logica P NaN Movilización IVE
This is the current output. What I'm not able to figure is how to perform some kind of operation and add a column. For example, adding a column counting only those rows with value "Presente" for each subgroup. I find myself complicating things too much, i'm afrading i'm missing something.
z:\devs\py\asist>python asistb.py asist2q.csv
Date Day Attendance Comments
Subject Class
Logica P 32 32 3 4
T 16 16 4 3
TP 16 16 1 1
Medieval P 16 16 0 0
T 32 32 5 4
TP 16 16 1 1
A tip or hint on how to approach the problem is greatly appreciated. Thanks a lot!
Upvotes: 1
Views: 72
Reputation: 1272
Given the example df
as follows
Date Day Subject Class Attendance Comments
0 6/8/2018 2 Medieval T Presente NaN
1 6/8/2018 2 Logica T Presente NaN
2 6/8/2018 2 Logica P NaN NaN
3 8/8/2018 4 Logica TP Sinclases MovilizaciónIVE
4 8/8/2018 4 Logica P NaN MovilizaciónIVE
In order to count the occurances of Presente
from the Attendance
column, you use a transform
method with string search
and count
operation and add the result in a new column as follows
df['counts'] = df.groupby(['Subject', 'Class'])['Attendance'].transform(
lambda x: x[x.str.contains('Presente', na=False)].count())
This results the following output
Date Day Subject Class Attendance Comments counts
0 6/8/2018 2 Medieval T Presente NaN 1
1 6/8/2018 2 Logica T Presente NaN 1
2 6/8/2018 2 Logica P NaN NaN 0
3 8/8/2018 4 Logica TP Sinclases MovilizaciónIVE 0
4 8/8/2018 4 Logica P NaN MovilizaciónIVE 0
To calculate the total number of Presente per Subject, you can use groupby
again and sum up the counts column
df.groupby(['Subject'])['counts'].sum()
The output is
Subject
Logica 1
Medieval 1
Name: counts, dtype: int32
Source:pandas groupby count string occurrence over column
Upvotes: 1
Reputation: 323396
Seems like you need unstack
+ stack
df.groupby(['Subject','Class']).count().unstack(fill_value=0).stack()
Out[296]:
Date Day Attendance Comments
Subject Class
Logica P 2 2 0 1
T 1 1 1 0
TP 1 1 1 1
Medieval P 0 0 0 0
T 1 1 1 0
TP 0 0 0 0
Upvotes: 3