Reputation: 23
I am making a predictor that generates 3 values: A, B, C for each prediction. I have made predictions on a dataset of ~7000 samples and built a Pandas dataframe that looks like this:
Sample | A | B | C | Correct |
---|---|---|---|---|
Sample_1 | 0.8 | 0.4 | 0.9 | True |
Sample_2 | 0.2 | 0.9 | 0.5 | False |
Sample_3 | 0.3 | 1.0 | 0.1 | True |
I want to be able to interpret the values A, B, C in my predictor to judge the quality of a prediction. How do I do this?
I can only think of combining them like this somehow: X = a*A + b*B + c*C with X being a measure of confidence in the prediction. But I wouldn't know how to get the optimal weights a, b, c.
Upvotes: 0
Views: 484
Reputation: 205
I think the right methodology for doing this type of task would be to follow these steps:
Encode the values in the "Correct" column to pass True -> 1 and False -> -1 and split the dataset into test and train.
Train a random forest to classify from A, B, C the target.
On the test set show the probability of each prediction with predict_proba(X) and make the mean. To go deeper you can the the feature importance and know wich of A, B or C is the most important.
Don't hesitate to see the doc on random forest here. I think this way you can know how A, B, C act in the prediction. After if you want other method you coud try ANOVA test to see if there is an independance between A, B, C and the target.
Upvotes: 1