Reputation: 139
I am trying to run all the elements in just_test_data
to all the elements in just_train_data
, and return the lowest number, then run the new just_test_data
through all the just_train_data
, and so on until all the just_test_data
has been run.
The error I keep getting is in the line
step_1 = (abs(just_test_data[i] - just_train_data[n]) ** 2)
IndexError: arrays used as indices must be of integer (or boolean) type
When I first try to run the loop.
import numpy as np
testing_data = np.genfromtxt("C:\Users\zkrumlinde\Desktop\Statistical Programming\Week 3\iris-testing-data.csv", delimiter= ',')
training_data = np.genfromtxt("C:\Users\zkrumlinde\Desktop\Statistical Programming\Week 3\iris-training-data.csv", delimiter= ',')
#create 4 arrays, the first two with the measurements of training and testing data
#the last two have the labels of each line
just_test_data = np.array(testing_data[:, 0:4])
just_train_data = np.array(training_data[:, 0:4])
testing_labels = np.array(testing_data[:, 4])
training_labels = np.array(training_data[:, 4])
n = 0
while n < len(just_train_data):
for i in just_test_data:
old_distance = 'inf'
step_1 = (abs(just_test_data[i] - just_train_data[n]) ** 2)
step_2 = sum(step_1)
new_distance = np.sqrt(step_2)
if new_distance < old_distance:
old_distance = new_distance
index = n
n = n + 1
print(training_labels[index])
Upvotes: 0
Views: 4640
Reputation: 63
when you say for i in just_test_data:
i will be the element itself, not the index.
you probably want something like for i in range(len(just_test_data))
this will have i
as a number from 0
to the length of just_test_data - 1
.
edit: a few weird things in your code:
step_1 = (abs(just_test_data[i] - just_train_data[n]) ** 2)
step_2 = sum(step_1)
new_distance = np.sqrt(step_2)
this just returns abs(just_test_data[i] - just_train_data[n])
. are you meaning to add a ton of step_1
up and then eventually take the sqrt
? you need to check your indents.
old_distance = 'inf'
is a string (pretty sure). you are probably looking for either np.inf
or float('inf')
. Also because you set this inside the for loop, it is getting reset for every i
. you probably want it above 'for i in just_test_data:'
a quick pass at your code:
min_distance = np.inf
for n in range(len(just_train_data)):
step_2 = 0
for i in range(len(just_test_data)):
step_1 = (just_test_data[i] - just_train_data[n]) ** 2
step_2 += step_1
distance = np.sqrt(step_2)
if distance < min_distance:
min_distance = distance
index = n
print(training_labels[index])
This compares a point in just_train_data
to all the points in just_test_data
to compute a distance. It will print the minimum of these distances.
Upvotes: 1
Reputation: 11
By using for i in just_test_data
you're iterating through all the elements in the just_test_data array and not and index between 0 and the array length.
Also, it seems that your n = n + 1
line is not indented correctly.
Here's my guess for an updated version of your code:
import numpy as np
testing_data = np.genfromtxt("C:\Users\zkrumlinde\Desktop\Statistical Programming\Week 3\iris-testing-data.csv", delimiter= ',')
training_data = np.genfromtxt("C:\Users\zkrumlinde\Desktop\Statistical Programming\Week 3\iris-training-data.csv", delimiter= ',')
#create 4 arrays, the first two with the measurements of training and testing data
#the last two have the labels of each line
just_test_data = np.array(testing_data[:, 0:4])
just_train_data = np.array(training_data[:, 0:4])
testing_labels = np.array(testing_data[:, 4])
training_labels = np.array(training_data[:, 4])
n = 0
while n < len(just_train_data):
for i in range(len(just_test_data)):
old_distance = 'inf'
step_1 = (abs(just_test_data[i] - just_train_data[n]) ** 2)
step_2 = sum(step_1)
new_distance = np.sqrt(step_2)
if new_distance < old_distance:
old_distance = new_distance
index = n
n = n + 1
print(training_labels[index])
Upvotes: 1