Reputation:
I have an array of repeated elements, where each repeated element represents a class. What i would like to do is obtain the indices of the repeated elements and partition in order of the nth first elements in 3 slices. For example:
np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
split the first occurences in 3
[0, 2, 1] [2, 0, 1], [2, 1, 0, 0]
I would like to find the indices of the repeated elements and split the array in proportions of 3, where each sliced array will contain the first 3 repeated elements indices:
So for the array and it's splits, i'd like to obtain the following:
array[0, 2, 2, 1, 0, 1, 2, 1, 0, 0]
indices:[0, 1, 3], [2, 4, 5], [6, 7, 8, 9]
I've tried the following:
a = np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
length = np.arange(len(a))
array_set = (([length[a ==unique] for unique in np.unique(a)]))
But i can't figure how to split the partitions in order of the first occurences like the above examples.
Upvotes: 2
Views: 223
Reputation: 5939
This is a problem where np.concatenate(...)
+ some algorithm + np.split(...)
does the trick, though they are slow methods.
Lets start from concatenation and referencing indexes where you split:
classes = [[0, 2, 1], [2, 0, 1], [2, 1, 0, 0]]
split_idx = np.cumsum(list(map(len, classes[:-1])))
flat_classes = np.concatenate(classes)
Then indexes that sorts an initial array and also indexes of starts of groups are needed. In this case sorted array is [0,0,0,0,1,1,1,2,2,2]
and distinct groups start at 0, 4 and 7.
c = np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
idx = np.argsort(c)
u, cnt = np.unique(c, return_counts=True)
marker_idx = np.r_[0, np.cumsum(cnt[:-1])]
Now this is a trickiest part. It is known that one of indexes 0, 4 or 7 changes in each step (while you iterate on flat_classes
), so you can accumulate these changes in a special array called counter
which has 3 columns for each index and after that access only these indexes where changes were met:
take = np.zeros((len(flat_classes), len(u)), dtype=int)
take[np.arange(len(flat_classes)), flat_classes] = 1
counter = np.cumsum(take, axis=0)
counter = counter + marker_idx - np.ones(len(u), dtype=int)
active_idx = counter[np.arange(len(flat_classes)), flat_classes]
splittable = idx[active_idx] #remember that we are working on indices that sorts array
output = np.split(splittable, split_idx)
Output
[array([0, 1, 3], dtype=int64),
array([2, 4, 5], dtype=int64),
array([6, 7, 8, 9], dtype=int64)]
Remark: the main idea of solution is to manipulate with changes of indexes of other indexes that sorts an array. This is example of changes
for this problem:
>>> counter
array([[0, 3, 6],
[0, 3, 7],
[0, 4, 7],
[0, 4, 8],
[1, 4, 8],
[1, 5, 8],
[1, 5, 9],
[1, 6, 9],
[2, 6, 9],
[3, 6, 9]]
Upvotes: 0
Reputation: 150735
This is a way to split the array in proportions of 3, that is, the last 0
will be left out:
# unique values
uniques = np.unique(a)
# counting occurrence of each unique value
occ = np.cumsum(a == uniques[:,None], axis=1)
# maximum common occurrence
max_occ = occ.max(axis=1).min()
# masking the first occurrences
u = (occ[None,...] == (np.arange(max_occ)+1)[:,None, None])
# the indexes
idx = np.sort(np.argmax(u, axis=-1), axis=-1)
# the partitions
partitions = a[idx]
Output:
# idx
array([[0, 1, 3],
[2, 4, 5],
[6, 7, 8]])
# partitions
array([[0, 2, 1],
[2, 0, 1],
[2, 1, 0]])
Upvotes: 1