How to avoid float values in regression models

Question

I am trying to predict wine quality (ranges from 1 to 10) using regression models such as linear,SGDRegressor, ridge,lasso.

dataset:http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv

Independent values:volatile acidity,residual sugar,free sulfur dioxide,total sulfur dioxide,alchohol Dependent:Quality

Linear model

regr = linear_model.LinearRegression(n_jobs=3)
regr.fit(x_train, y_train)
predicted = regr.predict(x_test)

predicted values for LinearRegression array([ 5.33560542, 5.47347404, 6.09337194, ..., 5.67566813, 5.43609198, 6.08189 ])

predicted values are in float instead of (1,2,3...10) I tried to round predicted values using numpy

predicted = np.round(regr.predict(x_test))` but my accuracy gone down with this attempt.

SGDRegressor model.

from sklearn import linear_model
np.random.seed(0)
clf = linear_model.SGDRegressor()
clf.fit(x_train, y_train)
redicted = np.floor(clf.predict(x_test))

predicted output values for SGDRegressor:

array([ -2.77685458e+12,   3.26826414e+12,   4.18655713e+11, ...,
     4.72375220e+12,  -7.08866307e+11,   3.95571514e+12])

Here I am unable to convert the output values into integers.

Could someone please let me know the best way to predict the wine quality using these regression models.

Jianxun Li · Accepted Answer

You are doing a regression and therefore the output is continuous in nature.

The thing you should note is that your mini-project on predicting wine quality is not a classification problem. The response variable y, the wine quality, has intrinsic order which means a score of 6 is strictly better than a score of 5. It is NOT categorical variable where different numbers just represent different groups where groups are non-comparable.

How to avoid float values in regression models

Answers (1)

Related Questions