pandas groupby and update the sum of the number of times the values in one column is greater than the other column

Question

I have a dataset in the following format

df = pd.DataFrame([[1, 'Label1', 0, 8, 2], [1, 'Label3', 0, 20, 5], [2, 'Label5', 1, 20, 2], [2, 'Label4', 1, 11, 0], 
              [5, 'Label2', 0, 0, -4],[1, 'Label2', 1, 8, 2], [2, 'Label5', 0, 20, 5], [3, 'Label2', 1, 20, 2], [4, 'Label4', 0, 1, 0], 
              [5, 'Label3', 0, 1, -4],[1, 'Label3', 1, 8, 2], [2, 'Label4', 0, 20, 5], [3, 'Label1', 1, 20, 2], [4, 'Label3', 0, 1, 0], 
              [5, 'Label4', 0, 1, -4],[1, 'Label4', 1, 8, 2], [2, 'Label3', 0, 20, 5], [3, 'Label3', 1, 20, 2], [4, 'Label5', 0, 1, 0], 
              [5, 'Label5', 0, 1, -4]],
              columns=['ID', 'Label', 'Status', 'Coeff', 'result'])

cm = {'TP': 0,'FP': 0}

For each ID in df, I would like to find the number of times the column Coeff is greater than Result when Status column is 1. If this count is greater than 3 then TP should be incremented by 1 and if it less than 3, then FP should be incremented by 1.

Example: When ID is 1111 and Status 1, if the Coeff column is greater than Result column twice for that particular ID, then FP must be incremented by 1.

I tried to add a new column called count for each ID and assigned a value as 1 everytime the column Coeff was greater than Result.

for ID in df.groupby('ID'): 
  df.loc[(df['Coeff'] > df['Result']), 'count'] = 1
  df_new = list(df[['ID','count']].groupby(df['ID']))

Then I thought of finding whether count has the number 1 in it. If it does, then increment TP. Otherwise, increment FP.

But I couldn't achieve it.

How do I get the required result?

cs95 · Accepted Answer

A simple grouping operation on a masked comparison should do:

v = df.Coeff.gt(df.result).where(df.Status.astype(bool)).groupby(df.ID).sum()

Or (to retain dtype=int, thanks piR!),

v = df.Coeff.gt(df.result).where(df.Status.astype(bool), 0).groupby(df.ID).sum()

v   # second expression result

ID
1    3
2    2
3    3
4    0
5    0
dtype: int64

Now,

cm['TP'] = v.gt(3).sum()
cm['FP'] = v.lt(3).sum()

Details
df.Coeff.gt(df.result) returns a mask. Now, hide all those values for which df.Status is not 1. This is done using (df.Coeff > df.result).where(df.Status.astype(bool)). Finally, take this masked result, and group on ID, followed by a sum to get your result.

pandas groupby and update the sum of the number of times the values in one column is greater than the other column

Answers (1)

Related Questions