user14438759
user14438759

Reputation:

Splitting a sorted array of repeated elements

I have an array of repeated elements, where each repeated element represents a class. What i would like to do is obtain the indices of the repeated elements and partition in order of the nth first elements in 3 slices. For example:

np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])

split the first occurences in 3

[0, 2, 1] [2, 0, 1], [2, 1, 0, 0]

I would like to find the indices of the repeated elements and split the array in proportions of 3, where each sliced array will contain the first 3 repeated elements indices:

So for the array and it's splits, i'd like to obtain the following:

array[0, 2, 2, 1, 0, 1, 2, 1, 0, 0]

indices:[0, 1, 3], [2, 4, 5], [6, 7, 8, 9]

I've tried the following:

a = np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
length = np.arange(len(a))
array_set = (([length[a ==unique] for unique in np.unique(a)]))

But i can't figure how to split the partitions in order of the first occurences like the above examples.

Upvotes: 2

Views: 223

Answers (2)

mathfux
mathfux

Reputation: 5939

This is a problem where np.concatenate(...) + some algorithm + np.split(...) does the trick, though they are slow methods.

Lets start from concatenation and referencing indexes where you split:

classes = [[0, 2, 1], [2, 0, 1], [2, 1, 0, 0]]
split_idx = np.cumsum(list(map(len, classes[:-1])))
flat_classes = np.concatenate(classes)

Then indexes that sorts an initial array and also indexes of starts of groups are needed. In this case sorted array is [0,0,0,0,1,1,1,2,2,2] and distinct groups start at 0, 4 and 7.

c = np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
idx = np.argsort(c)
u, cnt = np.unique(c, return_counts=True)
marker_idx = np.r_[0, np.cumsum(cnt[:-1])]

Now this is a trickiest part. It is known that one of indexes 0, 4 or 7 changes in each step (while you iterate on flat_classes), so you can accumulate these changes in a special array called counter which has 3 columns for each index and after that access only these indexes where changes were met:

take = np.zeros((len(flat_classes), len(u)), dtype=int)
take[np.arange(len(flat_classes)), flat_classes] = 1
counter = np.cumsum(take, axis=0)
counter = counter + marker_idx - np.ones(len(u), dtype=int)
active_idx = counter[np.arange(len(flat_classes)), flat_classes]
splittable = idx[active_idx] #remember that we are working on indices that sorts array
output = np.split(splittable, split_idx)

Output

[array([0, 1, 3], dtype=int64),
 array([2, 4, 5], dtype=int64),
 array([6, 7, 8, 9], dtype=int64)]

Remark: the main idea of solution is to manipulate with changes of indexes of other indexes that sorts an array. This is example of changes for this problem:

>>> counter
array([[0, 3, 6],
       [0, 3, 7],
       [0, 4, 7],
       [0, 4, 8],
       [1, 4, 8],
       [1, 5, 8],
       [1, 5, 9],
       [1, 6, 9],
       [2, 6, 9],
       [3, 6, 9]]

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150735

This is a way to split the array in proportions of 3, that is, the last 0 will be left out:

# unique values
uniques = np.unique(a)

# counting occurrence of each unique value 
occ = np.cumsum(a == uniques[:,None], axis=1)

# maximum common occurrence
max_occ = occ.max(axis=1).min()

# masking the first occurrences 
u = (occ[None,...] == (np.arange(max_occ)+1)[:,None, None])

# the indexes
idx = np.sort(np.argmax(u, axis=-1), axis=-1)

# the partitions
partitions = a[idx]

Output:

# idx
array([[0, 1, 3],
       [2, 4, 5],
       [6, 7, 8]])

# partitions
array([[0, 2, 1],
       [2, 0, 1],
       [2, 1, 0]])

Upvotes: 1

Related Questions