Reputation: 856
I have a list of length 50 filled with arrays of length 5. I am trying to calculate the distance between each array in the list and update a numpy array with the values.
The distance calculation is just taking the square root of the sum of the squared distance between each element in the arrays.
When I try:
primaryCustomer = np.zeros(shape = (50,50))
for customer in range(0,50):
for pair in range(0,50):
thisCustomer = [0 for i in range(51)]
if customer == pair:
thisCustomer[pair] = 999
else:
calculateScores = (((Customer[customer][0]-Customer[pair][0])**2
+ (Customer[customer][1]-Customer[pair][1])**2
+ (Customer[customer][2]-Customer[pair][2])**2
+ (Customer[customer][3]-Customer[pair][3])**2
+ (Customer[customer][4]-Customer[pair][4])**2 )**(0.5))
thisCustomer[pair] = calculateScores
np.append(primaryCustomer, thisCustomer)
a couple of things happen:
Any changes I make, like trying to treat thisCustomer in the loop as an array instead of a list and append to it, end up fixing one area but screwing up other ones even worse.
Here's how I'm getting the Customer data:
Customer = [[0,0,0,0,0] for i in range(51)]
for n in range(51):
Customer[n] = np.ones(5)
Customer[n][randint(2,4):5] = 0
np.random.shuffle(Customer[n])
I know there might be packaged ways to do this, but I'm trying to understand how things like KNN work in the background, so I'd like to keep to figuring out the logic in loops like above. Beyond that, any help would be greatly appreciated.
Upvotes: 1
Views: 1024
Reputation: 582
A couple things to notice at first.
primaryCustomer[a][b] = primaryCustomer[b][a]
because you are using a distance metric. This means that the ranges on your two for loops can be reset: numCustomers = 51
primaryCustomer = np.zeros(shape = (numCustomers, numCustomers))
for customerA in range(numCustomers-1):
for customerB in range(customerA+1, numCustomers):
primaryCustomer[customerA][customerB] = dist(customerA,customerB)
primaryCustomer += np.transpose(primaryCustomer)
Note* you can change the second for loop's range to also start from 0 to keep your original loop structure, but then you will need to remove the transposition line. You can also have
primaryCustomer[a][b] = primaryCustomer[b][a] = dist(a,b)
if you'd rather not use the transposition but still avoid unnecessary calculations.
primaryCustomer = np.zeros(shape = (50,50))
I'm assuming is meant to store the distance between two customers. However, it looks like you have 51 customers, not 50? thisCustomer
list doesn't seem necessary and in fact the solution posted by Reedinationer initializes it but never even uses it. Also, as someone stated alreadyd, that's not how np.append
works. You're best off modifying the distance matrix you create originally directly.primaryCustomer[a][a] = 999
? Shouldn't the distance between a list and itself be 0? If you really do want to have it be 999, I encourage you to figure out how to modify the code block above to account for that.Upvotes: 1
Reputation: 5774
I think this is what you are going for, but correct me if I'm wrong:
import numpy as np
from random import randint
Customer = [[0, 0, 0, 0, 0] for i in range(51)]
for n in range(51):
Customer[n] = np.ones(5)
Customer[n][randint(2, 4):5] = 0
np.random.shuffle(Customer[n])
primaryCustomer = np.zeros(shape=(50, 50))
for customer in range(0, 50):
thisCustomer = [0 for i in range(51)]
for pair in range(0, 50):
if customer == pair:
primaryCustomer[customer][pair] = 999
else:
calculateScores = (((Customer[customer][0] - Customer[pair][0]) ** 2
+ (Customer[customer][1] - Customer[pair][1]) ** 2
+ (Customer[customer][2] - Customer[pair][2]) ** 2
+ (Customer[customer][3] - Customer[pair][3]) ** 2
+ (Customer[customer][4] - Customer[pair][4]) ** 2) ** 0.5)
primaryCustomer[customer][pair] = calculateScores
print(primaryCustomer)
I think the main issue I found with your loops was the location of thisCustomer = [0 for i in range(51)]
, I think you meant to have it up one more level like in mine. I don't see any need for this line though and altered thisCustomer[pair]
to directly write to primaryCustomer[customer][pair]
instead, thereby negating the need for thisCustomer = [0 for i in range(51)]
every loop, which would speed up your program and improve memory usage by taking the line out entirely.
Sample output:
[[999. 2.23606798 1. ... 2. 0. 1.73205081] [ 2.23606798 999. 2. ... 1. 2.23606798 1.41421356] [ 1. 2. 999. ... 1.73205081 1. 2. ] ... [ 2. 1. 1.73205081 ... 999. 2. 1.73205081] [ 0. 2.23606798 1. ... 2. 999. 1.73205081] [ 1.73205081 1.41421356 2. ... 1.73205081 1.73205081 999. ]]
Upvotes: 1