Reputation: 11
I have created my sample data for machine learning just to checkout how classification and regression models work.
My sample data has 50 rows with columns for Memory
, CPU
, Responsetime
. I have generated Responsetime
using a formula Memory*2 + CPU*0.7
.
Now when I use this data to generate models for classification using different algorithms like DecisionTree, RandomForest, SVM, NaiveBayes, SGD, LogisticRegression, I get back kappa and correlation coefficients (model.coef_
) from the model and feature importances in case of decision tree, random forest.
The coefficient values returned for Memory
and CPU
are no where near to my formula that I used to generate these values of response time. I am not able to understand whether my models generated are right to use for prediction in this case or not.
For regression, Linear Regression did give me right coefficients matching with my formula.
Upvotes: 0
Views: 907
Reputation: 2077
You gave a linear formula: (Memory*2 + CPU*0.7)
and linear regression, a method that learns the B_j
values in y_i = B_0*1 + B_1*X_i_1 + ... + B_n*X_i_n
, was able to model that with the coefficients you would expect. That's because the form of the linear regression model matches the form of your equation, so it makes sense to match the coefficients directly.
For your classification algorithms, not only does the form of the equation not match your linear equation, but the problem is also not really a classification problem. You have given an example that is distinctly a regression problem.
Upvotes: 1