Zakary Krumlinde
Zakary Krumlinde

Reputation: 139

Using arrays in for loops python

I am trying to run all the elements in just_test_data to all the elements in just_train_data, and return the lowest number, then run the new just_test_data through all the just_train_data, and so on until all the just_test_data has been run.

The error I keep getting is in the line

step_1 = (abs(just_test_data[i] - just_train_data[n]) ** 2)

IndexError: arrays used as indices must be of integer (or boolean) type

When I first try to run the loop.

import numpy as np
testing_data = np.genfromtxt("C:\Users\zkrumlinde\Desktop\Statistical Programming\Week 3\iris-testing-data.csv", delimiter= ',')
training_data = np.genfromtxt("C:\Users\zkrumlinde\Desktop\Statistical Programming\Week 3\iris-training-data.csv", delimiter= ',')

#create 4 arrays, the first two with the measurements of training and testing data
#the last two have the labels of each line
just_test_data = np.array(testing_data[:, 0:4])
just_train_data = np.array(training_data[:, 0:4])
testing_labels = np.array(testing_data[:, 4])
training_labels = np.array(training_data[:, 4])

n = 0
while n < len(just_train_data):
    for i in just_test_data:
        old_distance = 'inf'
        step_1 = (abs(just_test_data[i] - just_train_data[n]) ** 2)
        step_2 = sum(step_1)
        new_distance = np.sqrt(step_2)
        if new_distance < old_distance:
            old_distance = new_distance
            index = n
        n = n + 1
print(training_labels[index])

Upvotes: 0

Views: 4640

Answers (2)

Ditchbuster
Ditchbuster

Reputation: 63

when you say for i in just_test_data: i will be the element itself, not the index.

you probably want something like for i in range(len(just_test_data)) this will have i as a number from 0 to the length of just_test_data - 1.

edit: a few weird things in your code:

step_1 = (abs(just_test_data[i] - just_train_data[n]) ** 2)
step_2 = sum(step_1)
new_distance = np.sqrt(step_2)

this just returns abs(just_test_data[i] - just_train_data[n]). are you meaning to add a ton of step_1 up and then eventually take the sqrt? you need to check your indents.

old_distance = 'inf' is a string (pretty sure). you are probably looking for either np.inf or float('inf'). Also because you set this inside the for loop, it is getting reset for every i. you probably want it above 'for i in just_test_data:'

a quick pass at your code:

min_distance = np.inf
for n in range(len(just_train_data)):
    step_2 = 0
    for i in range(len(just_test_data)):
        step_1 = (just_test_data[i] - just_train_data[n]) ** 2
        step_2 += step_1
    distance = np.sqrt(step_2)
    if distance < min_distance:
        min_distance = distance
        index = n
print(training_labels[index])

This compares a point in just_train_data to all the points in just_test_data to compute a distance. It will print the minimum of these distances.

Upvotes: 1

Marcio Lima
Marcio Lima

Reputation: 11

By using for i in just_test_data you're iterating through all the elements in the just_test_data array and not and index between 0 and the array length.

Also, it seems that your n = n + 1 line is not indented correctly.

Here's my guess for an updated version of your code:

import numpy as np
testing_data = np.genfromtxt("C:\Users\zkrumlinde\Desktop\Statistical Programming\Week 3\iris-testing-data.csv", delimiter= ',')
training_data = np.genfromtxt("C:\Users\zkrumlinde\Desktop\Statistical Programming\Week 3\iris-training-data.csv", delimiter= ',')

#create 4 arrays, the first two with the measurements of training and testing data
#the last two have the labels of each line
just_test_data = np.array(testing_data[:, 0:4])
just_train_data = np.array(training_data[:, 0:4])
testing_labels = np.array(testing_data[:, 4])
training_labels = np.array(training_data[:, 4])

n = 0
while n < len(just_train_data):
    for i in range(len(just_test_data)):
        old_distance = 'inf'
        step_1 = (abs(just_test_data[i] - just_train_data[n]) ** 2)
        step_2 = sum(step_1)
        new_distance = np.sqrt(step_2)
        if new_distance < old_distance:
            old_distance = new_distance
            index = n
    n = n + 1
print(training_labels[index])

Upvotes: 1

Related Questions