Reputation: 149
I have two columns, Prediction and Ground Truth. I want to get a count of true positives as a series using either numpy or pandas.
For example, my data is:
Prediction GroundTruth
True True
True False
True True
False True
False False
True True
I want a list that should have the following output:
tp_list = [1,1,2,2,2,3]
Is there a one-liner way to do this in numpy or pandas?
Currently, this is my solution:
tp = 0
for p, g in zip(data.Prediction, data.GroundTruth):
if p and g: # TP case
tp = tp + 1
tp_list.append(tp)
Upvotes: 3
Views: 1599
Reputation: 323306
Maybe you can using all
df.all(1).cumsum().tolist()
Out[156]: [1, 1, 2, 2, 2, 3]
numpy
solution
np.cumsum(np.all(df.values,1))
Out[159]: array([1, 1, 2, 2, 2, 3], dtype=int32)
Upvotes: 2
Reputation: 59274
If you want to know how many True
you predicted that are actually True
, use
(df['Prediction'] & df['GroundTruth']).cumsum()
0 1
1 1
2 2
3 2
4 2
5 3
dtype: int64
(thanks @Peter Leimbigiler for chiming in)
If you want to know how many you have predicted correctly just compare and use cumsum
(df['Prediction'] == df['GroundTruth']).cumsum()
which outputs
0 1
1 1
2 2
3 2
4 3
5 4
dtype: int64
Can always get a list by using .tolist()
(df4['Prediction'] == df4['GroundTruth']).cumsum().tolist()
[1, 1, 2, 2, 3, 4]
Upvotes: 4
Reputation: 11105
To get a running count (i.e., cumulative sum) of true positives, i.e., Prediction == True
if and only if GroundTruth == True
, the solution is a modification of @RafaelC's answer:
(df['Prediction'] & df['GroundTruth']).cumsum()
0 1
1 1
2 2
3 2
4 2
5 3
(df['Prediction'] & df['GroundTruth']).cumsum().tolist()
[1, 1, 2, 2, 2, 3]
Upvotes: 4