Reputation: 949
say we have array1 and array2 both of which are two dimensional, and may have non-unique rows, and different number of rows.
My final goal is to have a cleaned version of the two arrays with the same shape, and ordered such that for each row index the values in column 2,3, and 4 are the same.
In below I describe a possible sequence to achieve this final goal which I am wondering about the most efficient way for in using numpy.
1_if there are rows in array1 with similar values in column 2,3,4, remove them.
2_if there are rows in array2 with similar values in column 2,3,4, remove them.
So based on those columns, both arrays will have unique rows.
3_then I want to remove rows which in both arrays that do not have a matching row in the other array in terms of column 2,3,4.
So both arrays should have the same length now.
4_Then I want to reorder array1 so that with the same indecies array2 has the same values in column 2,3,4.
-------------edit: numerical example:
array1 = array([1,4,3, 64356,5435,434],
[11,46,3, 7356,585,74],
[51,406,3, 769,5435,24],
[12,45,5, 656,135,134],
[112,475,5, 656,1385,134],
[13,46, 5, 656,1385,19]])
array2 = array([15,44, 5, 656, 1385, 434],
[165,644,5, 656, 1385, 48],
[151,436,3, 356, 285,74],
[521,406,5, 656, 135,24],
[152,445,54, 56,635,134],
[1812,757,542, 546,185,1834],
[72,77,142, 66,65,64],
[72,727,12, 16,55,634]])
array1_final = array([112,475,5, 656,1385,134],
[12,45, 5, 656,135,134]
])
array2_final = array([15,44, 5, 656,1385,434],
[521,406,5, 656,135,24]
])
although array2[0] and array2[1] both have a match array1[4] in terms of their 2,3,4 columns, only one of them is kept in the final array2. Similarly , array1[5] was dropped. The final arrays are in the same order in terms of matching 2,3,4 columns. The rest are dropped because they don't have a matching counterpart in the other array in terms of 2,3,4 columns.
Upvotes: 0
Views: 95
Reputation: 750
I have an answer, although admittedly there may be a better one out there.
#find the unique rows
array1_v,array_i = np.unique(array1[:,[2,3,4]], axis=0, return_index=True)
array2_v,array2_i = np.unique(array2[:,[2,3,4]], axis=0, return_index=True)
#find if the unique rows exist in the other array
array1_in_array2 = [row.tolist() in array2_v.tolist() for row in array1_v] array2_in_array1 = [row.tolist() in array1_v.tolist() for row in array2_v]
array2_in_array1 = [row.tolist() in array1_v.tolist() for row in array2_v] array2_in_array1 = [row.tolist() in array1_v.tolist() for row in array2_v]
#final results
array1_final = array1[array1_i[array1_in_array2]]
array2_final = array2[array2_i[array2_in_array1]]
>>> array1_final
array([[ 12, 45, 5, 656, 135, 134],
[ 112, 475, 5, 656, 1385, 134]])
Upvotes: 1