Reputation: 704
Consider the array sample A.
import numpy as np
A = np.array([[2, 3, 6, 7, 3, 6, 7, 2],
[2, 3, 6, 7, 3, 6, 7, 7],
[2, 4, 3, 4, 6, 4, 9, 4],
[4, 9, 0, 1, 2, 5, 3, 0],
[5, 5, 2, 5, 4, 3, 7, 5],
[7, 5, 4, 8, 0, 1, 2, 6],
[7, 5, 4, 7, 3, 8, 0, 7]])
PROBLEM: I want to identify rows that have a specified number of DISTINCT element copies. The following code comes close: The code needs to be able to answer questions like "which rows of A have exactly 4 elements that appear twice?", or "which rows of A have exactly 1 element that appear three times?"
r,c = A.shape
nCopies = 4
s = np.sort(A,axis=1)
out = A[((s[:,1:] != s[:,:-1]).sum(axis=1)+1 == c - nCopies)]
This produces 2 output rows, both having 4 copied elements.
The 1st row has copies of 2,3,6,7. The 2nd row has copies of 3,6,7,7:
array([[2, 3, 6, 7, 3, 6, 7, 2],
[2, 3, 6, 7, 3, 6, 7, 7]])
My problem is that I don't want the 2nd output row because it only has 3 DISTINCT copies (ie: 3,6,7) How can to code be modified to identify only distinct copies?
Upvotes: 1
Views: 69
Reputation: 2372
If I understand correctly, you want the rows of A that have 4 distinct values and every value must have at least one copy. You can leverage np.unique(return_counts=True)
which returns 2 values, the distinct values and the count of each value.
counts = [np.unique(row,return_counts=True) for row in A ]
valid_indices = [ np.all(row[1] > 1) and row[0].shape[0] == 4 for row in counts ]
valid_rows = A[valid_indices]
Upvotes: 1