Jackson Ennis
Jackson Ennis

Reputation: 335

KNN : Found input variables with inconsistent numbers of samples: [20, 499]

Full replit here: https://repl.it/@JacksonEnnis/KNNPercentage

I am trying to use the KNN tool from sci-kit learn to make some predictions.

I have two functions, recurse() and predict(). recurse() is intended to iterate through every single possible combo of features, while predict is supposed to do the actual

  def predict(self, data, answers):
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.model_selection import train_test_split as tts

    import numpy as np
    if len(data) > 1:

      print("length before transposition {}".format(len(data)))
      #n_data = np.transpose(data)

      #print("length after transposition {}".format(len(n_data)))
      knn = KNeighborsClassifier(n_neighbors=1)
      xTrain, xTest, yTrain, yTest = tts(data, answers)
      print("xTrain data: {}".format(len(xTrain)))
      knn.fit(xTrain, yTrain)
      print(knn.score(xTest, yTest))

  def recurse(self, data):
    self.predict(data, self.y)
    if len(data) > 0:
      self.recurse(self.rLeft(data))
    if len(data) > 1:
      self.recurse(self.rMid(data))
    if len(data) > 2:
      self.recurse(self.rRight(data))

However, when I run the program, it states that is has a problem with the train/test line. I have checked the samples in each feature, as well as the answers, and found that they are all the same length, so why this is happening I am unsure.

Traceback (most recent call last):
  File "main.py", line 12, in <module>
    best = Config(apple)
  File "/home/runner/Config.py", line 13, in __init__
    self.predict(self.features, self.y)
  File "/home/runner/Config.py", line 45, in predict
    xTrain, xTest, yTrain, yTest = tts(data, answers)
  File "/home/runner/.local/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2096, in train_test_split
    arrays = indexable(*arrays)
  File "/home/runner/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 230, in indexable
    check_consistent_length(*result)
  File "/home/runner/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 205, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [20, 499]

Upvotes: 0

Views: 80

Answers (1)

bhuvy
bhuvy

Reputation: 304

You have your axes reversed. The format is that for each of your arrays, array.shape[0] must be the same size. I recommend you check out the scikit docs for more examples.

tts(np.array(data).T, answers)

Upvotes: 1

Related Questions