Reputation: 323
I am trying to predict wine quality (ranges from 1 to 10) using regression models such as linear,SGDRegressor, ridge,lasso.
dataset:http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv
Independent values:volatile acidity,residual sugar,free sulfur dioxide,total sulfur dioxide,alchohol Dependent:Quality
Linear model
regr = linear_model.LinearRegression(n_jobs=3)
regr.fit(x_train, y_train)
predicted = regr.predict(x_test)
predicted values for LinearRegression array([ 5.33560542, 5.47347404, 6.09337194, ..., 5.67566813, 5.43609198, 6.08189 ])
predicted values are in float instead of (1,2,3...10) I tried to round predicted values using numpy
predicted = np.round(regr.predict(x_test))` but my accuracy gone down with this attempt.
SGDRegressor model.
from sklearn import linear_model
np.random.seed(0)
clf = linear_model.SGDRegressor()
clf.fit(x_train, y_train)
redicted = np.floor(clf.predict(x_test))
predicted output values for SGDRegressor:
array([ -2.77685458e+12, 3.26826414e+12, 4.18655713e+11, ...,
4.72375220e+12, -7.08866307e+11, 3.95571514e+12])
Here I am unable to convert the output values into integers.
Could someone please let me know the best way to predict the wine quality using these regression models.
Upvotes: 4
Views: 9754
Reputation: 24742
You are doing a regression and therefore the output is continuous in nature.
The thing you should note is that your mini-project on predicting wine quality is not a classification problem. The response variable y, the wine quality, has intrinsic order which means a score of 6 is strictly better than a score of 5. It is NOT categorical variable where different numbers just represent different groups where groups are non-comparable.
Upvotes: 4