Masum Billah
Masum Billah

Reputation: 135

Unknown label type: 'continuous'

My fellow Team, Having an issue
----------------------

   Avg.SessionLength TimeonApp  TimeonWebsite LengthofMembership Yearly Amount Spent
    0   34.497268   12.655651    39.577668     4.082621                 587.951054
    1   31.926272   11.109461    37.268959     2.664034                 392.204933
    2   33.000915   11.330278    37.110597     4.104543                 487.547505
    3   34.305557   13.717514    36.721283     3.120179                 581.852344
    4   33.330673   12.795189    37.536653     4.446308                 599.406092
    5   33.871038   12.026925    34.476878     5.493507                 637.102448
    6   32.021596   11.366348    36.683776     4.685017                 521.572175 

Want to apply KNN

X = df[['Avg. Session Length', 'Time on App','Time on Website', 'Length of Membership']] 
y = df['Yearly Amount Spent'] 

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, 
random_state=42) 

from sklearn.neighbors import KNeighborsClassifier 
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train,y_train)

ValueError: Unknown label type: 'continuous'

Upvotes: 3

Views: 6801

Answers (2)

ml4294
ml4294

Reputation: 2629

I think you are actually trying to do a regression rather than a classification, since your code pretty much looks like you want to predict the yearly amount spent as a number. In this case, use

from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor(n_neighbors=1)

instead. If you really have a classification task, for example you want to classify into classes like ('yearly amount spent is low', 'yearly amount spent is high',...), you should discretize the labels and convert them into strings or integer numbers (as explained by @Miriam Farber), according to the thresholds you need to set manually in this case.

Upvotes: 0

Miriam Farber
Miriam Farber

Reputation: 19634

The values in Yearly Amount Spent column are real numbers, so they cannot serve as labels for a classification problem (see here):

When doing classification in scikit-learn, y is a vector of integers or strings.

Hence you get the error. If you want to build a classification model, you need to decide how you transform them into a finite set of labels.

Note that if you just want to avoid the error, you could do

import numpy as np
y = np.asarray(df['Yearly Amount Spent'], dtype="|S6")

This will transform the values in y into strings of the required format. Yet, every label will appear in only one sample, so you cannot really build a meaningful model with such set of labels.

Upvotes: 6

Related Questions