MissTeapot
MissTeapot

Reputation: 23

How to optimise weights of a linear combination in a model?

I am making a predictor that generates 3 values: A, B, C for each prediction. I have made predictions on a dataset of ~7000 samples and built a Pandas dataframe that looks like this:

Sample A B C Correct
Sample_1 0.8 0.4 0.9 True
Sample_2 0.2 0.9 0.5 False
Sample_3 0.3 1.0 0.1 True

I want to be able to interpret the values A, B, C in my predictor to judge the quality of a prediction. How do I do this?

I can only think of combining them like this somehow: X = a*A + b*B + c*C with X being a measure of confidence in the prediction. But I wouldn't know how to get the optimal weights a, b, c.

Upvotes: 0

Views: 484

Answers (1)

Virgaux Pierre
Virgaux Pierre

Reputation: 205

I think the right methodology for doing this type of task would be to follow these steps:

  • Encode the values in the "Correct" column to pass True -> 1 and False -> -1 and split the dataset into test and train.

  • Train a random forest to classify from A, B, C the target.

  • On the test set show the probability of each prediction with predict_proba(X) and make the mean. To go deeper you can the the feature importance and know wich of A, B or C is the most important.

Don't hesitate to see the doc on random forest here. I think this way you can know how A, B, C act in the prediction. After if you want other method you coud try ANOVA test to see if there is an independance between A, B, C and the target.

Upvotes: 1

Related Questions