tmakino
tmakino

Reputation: 648

Splitting values into similarly distributed evenly sized groups

Given a list of scalar values, how can we split the list into K evenly-sized groups such that the groups have similar distributions? Note that simplicity is strongly favored over efficiency.

I am currently doing:

sort values
create K empty groups: group_1, ... group_k
while values is not empty:
    for group in groups:
        group.add(values.pop())
        if values is empty:
            break

Upvotes: 5

Views: 2207

Answers (2)

btilly
btilly

Reputation: 46409

This is a variation on what @m.raynal came up with that will work well even when n is just a fairly small multiple of k.

  1. Sort the elements from smallest to largest.
  2. Create k empty groups.
  3. Put them into a Priority Queue sorted from least elements to most, then largest sum to smallest. (So the next element is always the one with the largest sum among all of those with the fewest elements.)
  4. For each element, take a group off of the priority queue, add that element, put the group back in the priority queue.

In practice this means that the first k elements go to groups randomly, the next k elements go in reverse order. And then it gets clever about keeping things balanced.

Depending on your application, the fact that the bottom two values are spaced predictably far apart could be a problem. If that is the case then you could complicate this by going "middle out". But that scheme is much more complicated.

Upvotes: 2

m.raynal
m.raynal

Reputation: 3113

Here is a way to (somehow) distribute values evenly. Let's assume your array of scalars A is of size n, with n being a multiple of k to make it more simple. One way could then be :

sort(A)
d = n/k
g = 0
for i from 0 to d-1 do {
  for j from 0 to k-1 do {
    group[(j+g) % k].add(A[k*i + j])
  }
  g ++
}

You then add the first k elements to the groups 1, ..., k, the k following to the groups 2, ..., k, 1, then 3, ...k, 1, 2 etc.

It would not work well if k² > n, in this case you should not increment g by 1, but by a larger value close to k/d. If k is almost n, then this algorithm becomes simply useless.

This gives absolutely no guarantee about an even distribution of the scalars if some extreme values were to be in A. But in the case A itself would be somehow well distributed, and n > k², then it would somehow distribute the values among the k groups.

It has at least the advantage of running in O(n) once A is sorted.

Upvotes: 2

Related Questions