Reputation: 81
I am learning Ml from udemy and below is the code that instructor use in his lecture.But I am not totally satisfied with this code because it gives many k value whose eror_rate are almost same(I have to manually check k value whose error rate are negligible).
Is there any other method available to find the best k value (n_neighbor
)?
error_rate = []
for i in range(1,40):
knn = KNeighborsClassifier(n_neighbors=i)
knn.fit(X_train,y_train)
pred_i = knn.predict(X_test)
error_rate.append(np.mean(pred_i != y_test))
using plot to show Error rate vs K value.
plt.figure(figsize=(10,6))
plt.plot(range(1,40),error_rate,color='blue', linestyle='dashed', marker='o',
markerfacecolor='red', markersize=10)
plt.title('Error Rate vs. K Value')
plt.xlabel('K')
plt.ylabel('Error Rate')
Upvotes: 1
Views: 5206
Reputation: 522
GridSearchCV and other similar algorithms are available in sklearn which can be used to do cross-validation and find optimal parameter
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
X = iris.data
y = iris.target
k_range = list(range(1,100))
weight_options = ["uniform", "distance"]
param_grid = dict(n_neighbors = k_range, weights = weight_options)
knn = KNeighborsClassifier()
grid = GridSearchCV(knn, param_grid, cv = 10, scoring = 'accuracy')
grid.fit(X,y)
print (grid.best_score_)
print (grid.best_params_)
print (grid.best_estimator_)
# 0.9800000000000001
# {'n_neighbors': 13, 'weights': 'uniform'}
# KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
# metric_params=None, n_jobs=None, n_neighbors=13, p=2,
# weights='uniform')
All algorithms are available here. https://scikit-learn.org/stable/modules/classes.html#hyper-parameter-optimizers
Upvotes: 5