Row-wise comparisons between two arrays

Question

I'm new to Numpy and Python and I have questions. I hope you could help me.

Say, I have two arrays. Both arrays have 11 columns, with the first column being the index.

This is the example of array1. By the way, array1 would be a constantly updating array live.

[(0, 537, 504, 547, 560, 553,  -92, -5132, 15972, 1, 1)
 (0, 537, 504, 547, 559, 553, -100, -5128, 16108, 1, 1)
 (0, 537, 504, 547, 560, 553, -124, -5088, 16092, 1, 1)
 (0, 537, 504, 547, 559, 553, -140, -5160, 16164, 1, 1)
 (0, 537, 504, 547, 560, 552, -112, -5320, 16072, 1, 1)
 (0, 537, 504, 547, 560, 552,  -24, -5092, 16092, 1, 1)
 (0, 537, 504, 547, 560, 551, -148, -5104, 16108, 1, 1)
 (0, 537, 504, 547, 560, 551,  -92, -5136, 16092, 1, 1)
 (0, 537, 504, 547, 560, 551,    4, -5032, 16076, 1, 1)
 (0, 537, 504, 547, 560, 551,  -60, -5096, 15944, 1, 1)
 (0, 537, 504, 547, 560, 552,  -48, -5084, 16072, 1, 1)
 (0, 537, 504, 547, 560, 552,  -48, -5084, 16072, 1, 1)
 (0, 537, 504, 547, 560, 552,  -48, -5084, 16072, 1, 1)
 (0, 537, 504, 547, 560, 552,  -48, -5084, 16072, 1, 1)]

I want to compare the value of the last row of array1 to all the rows of array2. (It has to be the last row of array1 because it's contents would come from a constantly updating csv file). I want to search for the closest row values between the non-label columns of the last row of array1 and all the non-label rows of array2. The label in array1 would be null and won't count in the comparison. It doesn't have to be an exact match, but I want it to locate the closest match within a set tolerance. The array2 will serve as a dictionary of sorts, with its labels serving as reference and the features in the row of those individual labels acting as the samples. Am I going about this right or is there a more appropriate way to achieve this? I intend to have 26 different types of labels representative of the 26 letters of the alphabet in array2, each with specific sets of column features. Those 26 types of labels will have 10 sample rows each. The tolerance range should be indicated by those 10 samples per label. Here is a sample of the csv data in array2 (which I could already convert into arrays).

LABEL,F1,F2,F3,F4,F5,X,Y,Z,C1,C2

1, 537, 504, 547, 560, 553, -92, -5132, 15972, 1, 1

1, 537, 504, 547, 559, 553, -100, -5128, 16108, 1, 1

1, 537, 504, 547, 560, 553, -124, -5088, 16092, 1, 1

1, 537, 504, 547, 559, 553, -140, -5160, 16164, 1, 1

1, 537, 504, 547, 560, 552, -112, -5320, 16072, 1, 1

1, 537, 504, 547, 560, 552, -24, -5092, 16092, 1, 1

1, 537, 504, 547, 560, 551, -148, -5104, 16108, 1, 1

1, 537, 504, 547, 560, 551, -92, -5136, 16092, 1, 1

1, 537, 504, 547, 560, 551, 4, -5032, 16076, 1, 1

1, 537, 504, 547, 560, 551, -60, -5096, 15944, 1, 1

2, 537, 504, 547, 560, 553, -92, -5132, 15972, 0, 0

2, 537, 504, 547, 559, 553, -100, -5128, 16108, 0, 0

2, 537, 504, 547, 560, 553, -124, -5088, 16092, 0, 0

2, 537, 504, 547, 559, 553, -140, -5160, 16164, 0, 0

2, 537, 504, 547, 560, 552, -112, -5320, 16072, 0, 0

2, 537, 504, 547, 560, 552, -24, -5092, 16092, 0, 0

2, 537, 504, 547, 560, 551, -148, -5104, 16108, 0, 0

2, 537, 504, 547, 560, 551, -92, -5136, 16092, 0, 0

2, 537, 504, 547, 560, 551, 4, -5032, 16076, 0, 0

2, 537, 504, 547, 560, 551, -60, -5096, 15944, 0, 0

1 is A, and 2 is B. As you can see, their only differences are the 1s and 0s in the last two columns. However, the other letters of the alphabet will have differences in multiple columns, that's why I want the last row of array1 to search for its closest match in array2.

I want to perform collective row-wise comparisons between the two arrays.

In the end, I want to print the label of array2 whose features are closest to that of the latest row of array1. And since the inputs in array1 are going to be constantly updating, if the last row of array1 would then change in values and correspond to a different label, I want it to update accordingly live as well.

Again, I'm a beginner in Python and Numpy I don't know if I'm approaching this correctly. I hope you can help me. Thank you in advance. I would really appreciate any help.

Christian Sloper · Accepted Answer

Your array A and B:

 A = np.array([(0, 537, 504, 547, 560, 553,  -92, -5132, 15972, 1, 1),
 (0, 537, 504, 547, 559, 553, -100, -5128, 16108, 1, 1),
 (0, 537, 504, 547, 560, 553, -124, -5088, 16092, 1, 1),
 (0, 537, 504, 547, 559, 553, -140, -5160, 16164, 1, 1),
 (0, 537, 504, 547, 560, 552, -112, -5320, 16072, 1, 1),
 (0, 537, 504, 547, 560, 552,  -24, -5092, 16092, 1, 1),
 (0, 537, 504, 547, 560, 551, -148, -5104, 16108, 1, 1),
 (0, 537, 504, 547, 560, 551,  -92, -5136, 16092, 1, 1),
 (0, 537, 504, 547, 560, 551,    4, -5032, 16076, 1, 1),
 (0, 537, 504, 547, 560, 551,  -60, -5096, 15944, 1, 1),
 (0, 537, 504, 547, 560, 552,  -48, -5084, 16072, 1, 1),
 (0, 537, 504, 547, 560, 552,  -48, -5084, 16072, 1, 1),
 (0, 537, 504, 547, 560, 552,  -48, -5084, 16072, 1, 1),
 (0, 537, 504, 547, 560, 552,  -48, -5084, 16072, 1, 1)])

 B = np.array([[1, 537, 504, 547, 560, 553, -92, -5132, 15972, 1, 1],
[1, 537, 504, 547, 559, 553, -100, -5128, 16108, 1, 1],
[1, 537, 504, 547, 560, 553, -124, -5088, 16092, 1, 1],
[1, 537, 504, 547, 559, 553, -140, -5160, 16164, 1, 1],
[1, 537, 504, 547, 560, 552, -112, -5320, 16072, 1, 1],
[1, 537, 504, 547, 560, 552, -24, -5092, 16092, 1, 1],
[1, 537, 504, 547, 560, 551, -148, -5104, 16108, 1, 1],
[1, 537, 504, 547, 560, 551, -92, -5136, 16092, 1, 1],
[1, 537, 504, 547, 560, 551, 4, -5032, 16076, 1, 1],
[1, 537, 504, 547, 560, 551, -60, -5096, 15944, 1, 1],
[2, 537, 504, 547, 560, 553, -92, -5132, 15972, 0, 0],
[2, 537, 504, 547, 559, 553, -100, -5128, 16108, 0, 0],
[2, 537, 504, 547, 560, 553, -124, -5088, 16092, 0, 0],
[2, 537, 504, 547, 559, 553, -140, -5160, 16164, 0, 0],
[2, 537, 504, 547, 560, 552, -112, -5320, 16072, 0, 0],
[2, 537, 504, 547, 560, 552, -24, -5092, 16092, 0, 0],
[2, 537, 504, 547, 560, 551, -148, -5104, 16108, 0, 0],
[2, 537, 504, 547, 560, 551, -92, -5136, 16092, 0, 0],
[2, 537, 504, 547, 560, 551, 4, -5032, 16076, 0, 0],
[2, 537, 504, 547, 560, 551, -60, -5096, 15944, 0, 0]])

Difference between B and last row of A

D = B - A[-1]

"Closest" is always a discussion, but say you want the one where the sum of the absolute values are at a minimum.

np.abs(D).sum(axis=1).argmin()

This yields row 5 is closest.

B[np.abs(D).sum(axis=1).argmin()] yields:

array([    1,   537,   504,   547,   560,   552,   -24, -5092, 16092,
           1,     1])

Row-wise comparisons between two arrays

Answers (1)

Related Questions