Reputation: 161
Basically, I would like to create a column ERROR RATE
that takes the sum of the YES
's in ERROR
and divides it by the NO
's in ERROR
, but to have that ERROR RATE
filtered per LOCATION
per DESCRIPTION
.
df = pd.DataFrame({'ERROR' : ['YES','YES','NO','NO','YES'],
'LOCATION' : ['London', 'New York', 'Edinburgh', 'Paris', 'Berlin'],
'DESCRIPTION' : ['Good', 'Bad', 'Good', 'Bad', 'Bad']})
I've tried writing code along the lines:
df.groupby(['DESCRIPTION','LOCATION'])['ERROR'].agg()
But I'm unsure if a) This is even the correct approach and b) What kind of function I need to write for the agg()
Thanks in advance
Upvotes: 1
Views: 40
Reputation: 402493
Maybe you're looking for groupby
+ value_counts
+ unstack
+ div
?
i = df.groupby(['DESCRIPTION' ,'LOCATION']).ERROR.value_counts().unstack(-1)
j = (i.YES / i.NO).to_frame('ERROR_RATE')
j
ERROR_RATE
DESCRIPTION LOCATION
Bad Berlin ...
New York ...
Paris ...
Good Edinburgh ...
London ...
If there are no counts for either YES
or NO
for a particular group, then the resulting columns will have NaN
for those entries in the final result.
If you're using agg
, you need a 3-way groupby.
i = df.groupby(['DESCRIPTION','LOCATION','ERROR']).ERROR.count().unstack()
j = (i.YES / i.NO).to_frame('ERROR_RATE')
j
ERROR_RATE
DESCRIPTION LOCATION
Bad Berlin ...
New York ...
Paris ...
Good Edinburgh ...
London ...
count
is the same as agg('count')
.
Upvotes: 2