Most efficient way to clean and reorder two arrays based on their matching selected columns

Question

say we have array1 and array2 both of which are two dimensional, and may have non-unique rows, and different number of rows.

My final goal is to have a cleaned version of the two arrays with the same shape, and ordered such that for each row index the values in column 2,3, and 4 are the same.

In below I describe a possible sequence to achieve this final goal which I am wondering about the most efficient way for in using numpy.

1_if there are rows in array1 with similar values in column 2,3,4, remove them.

2_if there are rows in array2 with similar values in column 2,3,4, remove them.

So based on those columns, both arrays will have unique rows.

3_then I want to remove rows which in both arrays that do not have a matching row in the other array in terms of column 2,3,4.

So both arrays should have the same length now.

4_Then I want to reorder array1 so that with the same indecies array2 has the same values in column 2,3,4.

-------------edit: numerical example:

array1 = array([1,4,3, 64356,5435,434],
               [11,46,3, 7356,585,74],
               [51,406,3, 769,5435,24],
               [12,45,5, 656,135,134],
               [112,475,5, 656,1385,134],
               [13,46,  5, 656,1385,19]])


array2 = array([15,44,  5, 656, 1385, 434],
               [165,644,5, 656, 1385, 48],
               [151,436,3, 356, 285,74],
               [521,406,5, 656, 135,24],
               [152,445,54, 56,635,134],
               [1812,757,542, 546,185,1834],
               [72,77,142, 66,65,64],
               [72,727,12, 16,55,634]])

array1_final = array([112,475,5, 656,1385,134],
                     [12,45,  5, 656,135,134]
                ])

array2_final = array([15,44,  5,  656,1385,434],
                     [521,406,5, 656,135,24]
                ])

although array2[0] and array2[1] both have a match array1[4] in terms of their 2,3,4 columns, only one of them is kept in the final array2. Similarly , array1[5] was dropped. The final arrays are in the same order in terms of matching 2,3,4 columns. The rest are dropped because they don't have a matching counterpart in the other array in terms of 2,3,4 columns.

Most efficient way to clean and reorder two arrays based on their matching selected columns

Answers (1)

Related Questions