Leo
Leo

Reputation: 89

Remove entire sub array from multi-dimensional array if any element in array is duplicate

I have a multi-dimensional array in Python where there may be a repeated integer within a vector in the array. For example.

array = [[1,2,3,4],
         [2,9,12,4],
         [5,6,7,8],
         [6,8,12,13]]

I would like to completely remove the vectors that contain any element that has appeared previously. In this case, vector [2,9,12,4] and vector [6,11,12,13] should be removed because they have an element (2 and 6 respectively) that has appeared in a previous vector within that array. Note that [6,8,12,13] contains two elements that have appeared previously, so the code should be able to work with these scenarios as well.

The resulting array should end up being:

array = [[1,2,3,4],
         [5,6,7,8]]

I thought I could achieve this with np.unique(array, axis=0), but I couldnt find another function that would take care of this particular uniqueness.

Any thoughts are appreaciated.

Upvotes: 0

Views: 180

Answers (2)

mathfux
mathfux

Reputation: 5949

You can work with array of sorted numbers and corresponding indices of rows that looks like so:

number_info = array([[ 0,  1],
                     [ 0,  2],
                     [ 1,  2],
                     [ 0,  3],
                     [ 0,  4],
                     [ 1,  4],
                     [ 2,  5],
                     [ 2,  6],
                     [ 3,  6],
                     [ 2,  7],
                     [ 2,  8],
                     [ 3,  8],
                     [ 1,  9],
                     [ 1, 12],
                     [ 3, 12],
                     [ 3, 13]])

It indicates that rows remove_idx = [2, 5, 8, 11, 14] of this array needs to be removed and it points to rows rows_idx = [1, 1, 3, 3, 3] of the original array. Now, the code:

flat_idx = np.repeat(np.arange(array.shape[0]), array.shape[1])
number_info = np.transpose([flat_idx, array.ravel()])
number_info = number_info[np.argsort(number_info[:,1])]
remove_idx = np.where((np.diff(number_info[:,1])==0) & 
                      (np.diff(number_info[:,0])>0))[0] + 1
remove_rows = number_info[remove_idx, 0]
output = np.delete(array, remove_rows, axis=0)

Output:

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Upvotes: 1

Samwise
Samwise

Reputation: 71517

Here's a quick way to do it with a list comprehension and set intersections:

>>> array = [[1,2,3,4],
...          [2,9,12,4],
...          [5,6,7,8],
...          [6,8,12,13]]
>>> [v for i, v in enumerate(array) if not any(set(a) & set(v) for a in array[:i])]
[[1, 2, 3, 4], [5, 6, 7, 8]]

Upvotes: 0

Related Questions