Finding unique values in each row

Question

I have an array with strings of size of 2 and want to get unique strings in each row.

np.__version__
# '1.19.2'
arr = np.array([['Z7', 'Q4', 'Q4'], # 2 unique strings
                ['Q4', 'Z7', 'Q4'], # 2 unq strings
                ['Q4', 'Z7', 'Z7'], # 2 unq strings
                ['Z7', 'Z7', 'Q4'], # 2 unq strings
                ['D8', 'D8', 'L1'], # 2 unq strings
                ['L1', 'L1', 'D8']], dtype='


It is guaranteed that every row contains the same number of uniques strings i.e. every row will have the same number of unique strings in my case it's 2.
Expected output:
array([['Q4', 'Z7'],
       ['Q4', 'Z7'],
       ['Q4', 'Z7'],
       ['Q4', 'Z7'],
       ['D8', 'L1'],
       ['D8', 'L1']], dtype='

Here, each row is sorted but it's doesn't have to be. It's fine both ways.
My code:
np.apply_along_axis(np.unique, 1, arr)

# array([['Q4', 'Z7'],
#        ['Q4', 'Z7'],
#        ['Q4', 'Z7'],
#        ['Q4', 'Z7'],
#        ['D8', 'L1'],
#        ['D8', 'L1']], dtype='

I thought np.unique over axis 1 would give expected results but
np.unique(arr, axis=1)
# array([['Q4', 'Q4', 'Z7'],
#        ['Q4', 'Z7', 'Q4'],
#        ['Z7', 'Z7', 'Q4'],
#        ['Q4', 'Z7', 'Z7'],
#        ['L1', 'D8', 'D8'],
#        ['D8', 'L1', 'L1']], dtype='

I couldn't understand what exactly happened and why it returned this exact output.

Valdi_Bo · Accepted Answer

Documentation of np.unique, in the description of axis parameter, contains the following statement:

... subarrays indexed by the given axis will be be flattened treated as the elements of a 1-D array

So if you call np.unique, passing axis=1, then:

Each column is flattened (as each column contains "atomic" values, nothing happens).
Finding of unique elements is performed on the resulting list (list of columns). If 2 columns were just the same then only one of them would have been retained.
The result is presented possibly in a changed order (this is an internal implementation detail.

A bit of explanation why each column (not row): Axis "1" is actually "columns".

To confirm that in this case each column is the processe object, define the source array as:

arr_2 = np.array([['Z7', 'Q4', 'Q4', 'Q4'],
                  ['Q4', 'Z7', 'Q4', 'Q4'],
                  ['Q4', 'Z7', 'Z7', 'Z7'],
                  ['Z7', 'Z7', 'Q4', 'Q4'],
                  ['D8', 'D8', 'L1', 'L1'],
                  ['L1', 'L1', 'D8', 'D8']])

where 2 last columns are just the same.

When you execute np.unique(arr_2, axis=1), the result will be just the same. Two last columns were exactly the same, so one of them has been eliminated.

Finding unique values in each row

My code:

Answers (2)

Related Questions