Reputation: 177
I'm working on the basics of machine learning with the iris dataset. I think I understand the idea of splitting data and making predictions on new data; however, I'm having trouble understanding the results I get for the code below: iris = load_iris()
X = iris.data
y = iris.target
len(X)--result: 150
X_train, X_test, y_train, y_test = train_test_split( X, y, random_state=5)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print(y_pred)
print(metrics.accuracy_score(y_test, y_pred))
Result: [1 2 2 0 2 1 0 2 0 1 1 2 2 2 0 0 2 2 0 0 1 2 0 2 1 2 1 1 1 2 0 1 1 0 1 0 0 2] 0.95% accuracy
I only get back 38 results. From what I understand, the data is split into 50 50 chunks, meaning I should get back 50 results for the data not part of the train and test data. Why do I get only 38?
I feel like my biggest question regarding Machine Learning is actually using the model.
Upvotes: 0
Views: 76
Reputation: 558
By default train_test_split
set test_size to 0.25. In case of 50 it will be 12.5, so 38 values are correct.
sklearn.model_selection.train_test_split
Upvotes: 1