Reputation: 335
Full replit here: https://repl.it/@JacksonEnnis/KNNPercentage
I am trying to use the KNN tool from sci-kit learn to make some predictions.
I have two functions, recurse() and predict(). recurse() is intended to iterate through every single possible combo of features, while predict is supposed to do the actual
def predict(self, data, answers):
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split as tts
import numpy as np
if len(data) > 1:
print("length before transposition {}".format(len(data)))
#n_data = np.transpose(data)
#print("length after transposition {}".format(len(n_data)))
knn = KNeighborsClassifier(n_neighbors=1)
xTrain, xTest, yTrain, yTest = tts(data, answers)
print("xTrain data: {}".format(len(xTrain)))
knn.fit(xTrain, yTrain)
print(knn.score(xTest, yTest))
def recurse(self, data):
self.predict(data, self.y)
if len(data) > 0:
self.recurse(self.rLeft(data))
if len(data) > 1:
self.recurse(self.rMid(data))
if len(data) > 2:
self.recurse(self.rRight(data))
However, when I run the program, it states that is has a problem with the train/test line. I have checked the samples in each feature, as well as the answers, and found that they are all the same length, so why this is happening I am unsure.
Traceback (most recent call last):
File "main.py", line 12, in <module>
best = Config(apple)
File "/home/runner/Config.py", line 13, in __init__
self.predict(self.features, self.y)
File "/home/runner/Config.py", line 45, in predict
xTrain, xTest, yTrain, yTest = tts(data, answers)
File "/home/runner/.local/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2096, in train_test_split
arrays = indexable(*arrays)
File "/home/runner/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 230, in indexable
check_consistent_length(*result)
File "/home/runner/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 205, in check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [20, 499]
Upvotes: 0
Views: 80
Reputation: 304
You have your axes reversed. The format is that for each of your arrays, array.shape[0]
must be the same size. I recommend you check out the scikit docs for more examples.
tts(np.array(data).T, answers)
Upvotes: 1