Reputation: 173
I have a DF that looks like this:
trainee | course | completed | days overdue
Ava ABC Yes 0
Bob ABC Yes 1
Charlie DEF No 10
David DEF Yes 0
Emily DEF Yes 0
Finn GHI Yes 0
I need to create a DF that tells me how many courses were taken, how many were taken in time (i.e. with 0 overdue days), and what's the percentage of completeness.
I.e. the result should look like this:
course | count | in time | % completed in time
ABC 2 1 0.5
DEF 3 2 0.66
GHI 1 1 1
How can I do it with Pandas?
Thanks!
FP
p.s. here's the code to generate the input DF:
df = pd.DataFrame({'Trainee': ['Ava','Bob','Charlie','David','Emily','Finn'],'Course':['ABC','ABC','DEF','DEF','DEF','GHI'],'Completed': ['Yes','Yes','No','Yes','Yes','Yes'],'Days overdue':[0,1,10,0,0,0]})
Upvotes: 3
Views: 5221
Reputation: 28644
one option, which offers a modest speed up, is to build an array of 1s and 0s for the days overdue, before grouping:
temp = df.assign(d = np.where(df['Days overdue'] == 0, 1, 0)).groupby('Course')
count = temp.Trainee.size()
in_time = temp.d.sum()
completed_in_time = in_time/count
pd.DataFrame({'count':count,
'in time' : in_time,
'% completed in time' : completed_in_time
})
count in time % completed in time
Course
ABC 2 1 0.500000
DEF 3 2 0.666667
GHI 1 1 1.000000
Upvotes: 0
Reputation: 862511
Use agg
fot total and count 0
s per groups, last divide columns by div
:
tups = [('count', 'size'), ('in time', lambda x: (x==0).sum())]
df = df.groupby('course')['days overdue'].agg(tups).reset_index()
df['% completed in time'] = df['in time'].div(df['count'])
print (df)
course count in time % completed in time
0 ABC 2 1 0.500000
1 DEF 3 2 0.666667
2 GHI 1 1 1.000000
Upvotes: 5