Reputation: 20689
I have an array with strings of size of 2 and want to get unique strings in each row.
np.__version__
# '1.19.2'
arr = np.array([['Z7', 'Q4', 'Q4'], # 2 unique strings
['Q4', 'Z7', 'Q4'], # 2 unq strings
['Q4', 'Z7', 'Z7'], # 2 unq strings
['Z7', 'Z7', 'Q4'], # 2 unq strings
['D8', 'D8', 'L1'], # 2 unq strings
['L1', 'L1', 'D8']], dtype='<U2') # 2 unq strings
It is guaranteed that every row contains the same number of uniques strings i.e. every row will have the same number of unique strings in my case it's 2.
Expected output:
array([['Q4', 'Z7'],
['Q4', 'Z7'],
['Q4', 'Z7'],
['Q4', 'Z7'],
['D8', 'L1'],
['D8', 'L1']], dtype='<U2')
Here, each row is sorted but it's doesn't have to be. It's fine both ways.
np.apply_along_axis(np.unique, 1, arr)
# array([['Q4', 'Z7'],
# ['Q4', 'Z7'],
# ['Q4', 'Z7'],
# ['Q4', 'Z7'],
# ['D8', 'L1'],
# ['D8', 'L1']], dtype='<U2')
I thought np.unique
over axis 1 would give expected results but
np.unique(arr, axis=1)
# array([['Q4', 'Q4', 'Z7'],
# ['Q4', 'Z7', 'Q4'],
# ['Z7', 'Z7', 'Q4'],
# ['Q4', 'Z7', 'Z7'],
# ['L1', 'D8', 'D8'],
# ['D8', 'L1', 'L1']], dtype='<U2')
I couldn't understand what exactly happened and why it returned this exact output.
Upvotes: 1
Views: 1010
Reputation: 31011
Documentation of np.unique, in the description of axis parameter, contains the following statement:
... subarrays indexed by the given axis will be be flattened treated as the elements of a 1-D array
So if you call np.unique, passing axis=1, then:
A bit of explanation why each column (not row): Axis "1" is actually "columns".
To confirm that in this case each column is the processe object, define the source array as:
arr_2 = np.array([['Z7', 'Q4', 'Q4', 'Q4'],
['Q4', 'Z7', 'Q4', 'Q4'],
['Q4', 'Z7', 'Z7', 'Z7'],
['Z7', 'Z7', 'Q4', 'Q4'],
['D8', 'D8', 'L1', 'L1'],
['L1', 'L1', 'D8', 'D8']])
where 2 last columns are just the same.
When you execute np.unique(arr_2, axis=1)
, the result will
be just the same. Two last columns were exactly the same,
so one of them has been eliminated.
Upvotes: 1
Reputation: 957
That is because numpy.unique flattens either the row or column subarrays and then returns the unique rows (axis = 0) or columns (axis = 1), instead of the unique values itself. Take a look at this example:
a = np.array([[1, 0, 0], [1, 0, 0], [2, 3, 4]])
np.unique(a, axis=0)
The output is:
array([[1, 0, 0], [2, 3, 4]])
and
b = np.array([[1, 1, 0], [1, 1, 0], [2, 2, 4]])
np.unique(b, axis=1)
The output is:
array([[0, 1],
[0, 1],
[4, 2]])
In your case you want the unique values per row itself and therefore should apply the along_axis command like you already implemented. The axis = 1 does not do much as your columns are all unique and only shows some sorting.
Upvotes: 2