Reputation: 31
Im trying to make a Machine Learning approach but I'm having some problems. This is my Code:
import sys
import scipy
import numpy
import matplotlib
import pandas
import sklearn
from pandas.plotting import scatter_matrix
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
dataset = pandas.read_csv('Libro111.csv')
array = numpy.asarray(dataset,dtype=numpy.float64) #all values are float64
X = array[:,1:49]
Y = array[:,0]
validation_size = 0.2
seed = 7.0
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed)
scoring = 'accuracy'
models = []
models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC()))
results = []
names = []
for name, model in models:
kfold = model_selection.KFold(n_splits=10, random_state=seed)
cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
results.append(cv_results)
names.append(name)
msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
print(msg)
And then I get two different errors.
For Logistic Regression:
File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py", line 172, in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'
I found someone who had the same problems but I couldn't sort it out yet..
And (most important):
File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py", line 97, in unique_labels
raise ValueError("Unknown label type: %s" % repr(ys))
ValueError: Unknown label type: (array([ 0.5, 0. , 1. , 1. , 0.5, 0.5, 1. , 0.5, 0. , 0.5, 1. ,
0. , 0. , 0. , 1. , 1......
In both cases the error come when I execute "cv_result" line... So, I hope you can help me...
Upvotes: 3
Views: 7273
Reputation: 41
"ValueError: Unknown label type: 'continuous'" means Your "Y" values are not class type of data (multiple rows share a same integer value. each integer represent a class). Therefore, you cannot use "DecisionTreeClassifier", "KNeighborsClassifier", "LogisticRegression"(do not be fooled by its name, LogisticRegression is a boolean classification method) or any other classification machine learning methods. In reality, your "Y" values are all different or 'continuous' (probably are float numbers), so you can only use the regression machine learning (i.e. "RandomForestRegressor").
Here are two solutions:
a) Group Y values into bins (classes). Apply classification modeling to your data.
b) If you prefer your predictions to have values (float numbers), You need to use the regression machine learning methods to predict Y values.
By the way, the "scoring = 'accuracy'" evaluation method is for classification modeling.
Upvotes: 4