Reputation: 13
I have a dataset in the following format
df = pd.DataFrame([[1, 'Label1', 0, 8, 2], [1, 'Label3', 0, 20, 5], [2, 'Label5', 1, 20, 2], [2, 'Label4', 1, 11, 0],
[5, 'Label2', 0, 0, -4],[1, 'Label2', 1, 8, 2], [2, 'Label5', 0, 20, 5], [3, 'Label2', 1, 20, 2], [4, 'Label4', 0, 1, 0],
[5, 'Label3', 0, 1, -4],[1, 'Label3', 1, 8, 2], [2, 'Label4', 0, 20, 5], [3, 'Label1', 1, 20, 2], [4, 'Label3', 0, 1, 0],
[5, 'Label4', 0, 1, -4],[1, 'Label4', 1, 8, 2], [2, 'Label3', 0, 20, 5], [3, 'Label3', 1, 20, 2], [4, 'Label5', 0, 1, 0],
[5, 'Label5', 0, 1, -4]],
columns=['ID', 'Label', 'Status', 'Coeff', 'result'])
cm = {'TP': 0,'FP': 0}
For each ID
in df, I would like to find the number of times the column Coeff
is greater than Result
when Status
column is 1. If this count is greater than 3 then TP
should be incremented by 1 and if it less than 3, then FP
should be incremented by 1.
Example: When ID
is 1111 and Status
1, if the Coeff
column is greater than Result
column twice for that particular ID, then FP must be incremented by 1.
I tried to add a new column called count for each ID and assigned a value as 1 everytime the column Coeff
was greater than Result
.
for ID in df.groupby('ID'):
df.loc[(df['Coeff'] > df['Result']), 'count'] = 1
df_new = list(df[['ID','count']].groupby(df['ID']))
Then I thought of finding whether count
has the number 1 in it. If it does, then increment TP
. Otherwise, increment FP
.
But I couldn't achieve it.
How do I get the required result?
Upvotes: 1
Views: 202
Reputation: 402844
A simple grouping operation on a masked comparison should do:
v = df.Coeff.gt(df.result).where(df.Status.astype(bool)).groupby(df.ID).sum()
Or (to retain dtype=int
, thanks piR!),
v = df.Coeff.gt(df.result).where(df.Status.astype(bool), 0).groupby(df.ID).sum()
v # second expression result
ID
1 3
2 2
3 3
4 0
5 0
dtype: int64
Now,
cm['TP'] = v.gt(3).sum()
cm['FP'] = v.lt(3).sum()
Details
df.Coeff.gt(df.result)
returns a mask. Now, hide all those values for which df.Status
is not 1. This is done using (df.Coeff > df.result).where(df.Status.astype(bool))
. Finally, take this masked result, and group on ID
, followed by a sum to get your result.
Upvotes: 1