Reputation: 135
My fellow Team,
Having an issue
----------------------
Avg.SessionLength TimeonApp TimeonWebsite LengthofMembership Yearly Amount Spent
0 34.497268 12.655651 39.577668 4.082621 587.951054
1 31.926272 11.109461 37.268959 2.664034 392.204933
2 33.000915 11.330278 37.110597 4.104543 487.547505
3 34.305557 13.717514 36.721283 3.120179 581.852344
4 33.330673 12.795189 37.536653 4.446308 599.406092
5 33.871038 12.026925 34.476878 5.493507 637.102448
6 32.021596 11.366348 36.683776 4.685017 521.572175
Want to apply KNN
X = df[['Avg. Session Length', 'Time on App','Time on Website', 'Length of Membership']]
y = df['Yearly Amount Spent']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,
random_state=42)
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train,y_train)
ValueError: Unknown label type: 'continuous'
Upvotes: 3
Views: 6801
Reputation: 2629
I think you are actually trying to do a regression rather than a classification, since your code pretty much looks like you want to predict the yearly amount spent as a number. In this case, use
from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor(n_neighbors=1)
instead. If you really have a classification task, for example you want to classify into classes like ('yearly amount spent is low', 'yearly amount spent is high',...), you should discretize the labels and convert them into strings or integer numbers (as explained by @Miriam Farber), according to the thresholds you need to set manually in this case.
Upvotes: 0
Reputation: 19634
The values in Yearly Amount Spent
column are real numbers, so they cannot serve as labels for a classification problem (see here):
When doing classification in scikit-learn, y is a vector of integers or strings.
Hence you get the error. If you want to build a classification model, you need to decide how you transform them into a finite set of labels.
Note that if you just want to avoid the error, you could do
import numpy as np
y = np.asarray(df['Yearly Amount Spent'], dtype="|S6")
This will transform the values in y
into strings of the required format. Yet, every label will appear in only one sample, so you cannot really build a meaningful model with such set of labels.
Upvotes: 6