Reputation: 95
I have a 2d-array of shape 5x4 like this:
array([[3, 3, 3, 3],
[3, 3, 3, 3],
[3, 3, 2, 2],
[2, 2, 2, 2],
[2, 2, 2, 2]])
And I'd like to obtain another array that contains arrays of unique values, something like this:
array([array([3]), array([3]), array([2, 3]), array([2]), array([2])],
dtype=object)
I obtained that with the following code:
np.array([np.unique(row) for row in matrix])
However, this is not vectorized. How could I achieve the same in a vectorized numpy operation?
Upvotes: 0
Views: 142
Reputation: 221614
Here's one way to minimize the compute when iterating and should help boost performance -
b = np.sort(a,axis=1)
o = np.ones((len(a),1), dtype=bool)
mask = np.c_[o,b[:,:-1] != b[:,1:]]
c = b[mask]
out = np.split(c, mask.sum(1).cumsum())[:-1]
A loop to use slicing
could be better than np.split
. So, with each iteration, all we do would be slicing. Hence, the last step could be replaced by something like this -
idx = np.r_[0,mask.sum(1).cumsum()]
out = []
for (i,j) in zip(idx[:-1],idx[1:]):
out.append(c[i:j])
Upvotes: 1
Reputation: 2002
numpy
arrays must have a defined shape, so if your data has only 1
value for some rows and 2
or more for others, then that won't do. A work around is to pad the array with a known value, eg. np.nan
.
In this case np.unique
will sort it all out for you. If you use its axis
argument. In this case you want unique values per row, so we use axis=1
:
arr = np.array([[3, 3, 3, 3],
[3, 3, 3, 3],
[3, 3, 2, 2],
[2, 2, 2, 2],
[2, 2, 2, 2]])
np.unique(arr, axis=1)
>>> array([[3, 3],
[3, 3],
[2, 3],
[2, 2],
[2, 2]])
The result is an array and has the correct unique values for each row, albeit some are duplicated, but this is the price for having an array.
Upvotes: 1