Convert 2d-array to 2d-array of unique values per row

Question

I have a 2d-array of shape 5x4 like this:

array([[3, 3, 3, 3],
   [3, 3, 3, 3],
   [3, 3, 2, 2],
   [2, 2, 2, 2],
   [2, 2, 2, 2]])

And I'd like to obtain another array that contains arrays of unique values, something like this:

array([array([3]), array([3]), array([2, 3]), array([2]), array([2])],
      dtype=object)

I obtained that with the following code:

np.array([np.unique(row) for row in matrix])

However, this is not vectorized. How could I achieve the same in a vectorized numpy operation?

Paddy Harrison · Accepted Answer

numpy arrays must have a defined shape, so if your data has only 1 value for some rows and 2 or more for others, then that won't do. A work around is to pad the array with a known value, eg. np.nan.

In this case np.unique will sort it all out for you. If you use its axis argument. In this case you want unique values per row, so we use axis=1:

arr = np.array([[3, 3, 3, 3],
                [3, 3, 3, 3],
                [3, 3, 2, 2],
                [2, 2, 2, 2],
                [2, 2, 2, 2]])

np.unique(arr, axis=1)
>>> array([[3, 3],
           [3, 3],
           [2, 3],
           [2, 2],
           [2, 2]])

The result is an array and has the correct unique values for each row, albeit some are duplicated, but this is the price for having an array.

Convert 2d-array to 2d-array of unique values per row

Answers (2)

Related Questions