Malthus
Malthus

Reputation: 578

Algorithm - Group/sort list to maximize minimum average group value

I'm looking for some help in writing an algorithm in Python that accomplishes the following:

Given a list of real numbers, sort/group the list into n smaller lists such that the average minimum group value is maximized.

For example, consider grouping the list below into two lists - A and B, each with two elements.

lis = [1,1,2,2]

In the first scenario below, the minimum value of each list is 1, and as such the average minimum value is 1.

# Scenario 1
A = [1,2]
B = [1,2]

# Scenario 2
A = [1,1]
B = [2,2]

In the second scenario, the minimum value of A is 1, and the minimum value of B is 2, and so the average minimum value is 1.5. This arrangement is optimal.

It's clear that it's better to group values that are 'similar'. I could do this with Jenks natural breaks optimization (or one-dimensional k-means clustering). However, I'm not sure if my objective and the objective of Jenks optimization is (mathematically) equivalent.

Any help or input would be appreciated.

Edit: The smaller lists must all have the same size (assume the given list always divides into the smaller groups with no remainder).

Upvotes: 1

Views: 1587

Answers (2)

Richard
Richard

Reputation: 61289

The best way to address this is to sort the numbers from smallest to largest and then to split the sorted list into n groups without further rearrangement. Any attempt to improve on this grouping will decrease the minimum of one of the groups and, therefore, the average of the minimums.

An example may help explain why.

Given a list with 12 numbers:

[94, 82, 61, 2, 96, 34, 87, 13, 82, 91, 61, 39]

The sorted list is:

[2, 13, 34, 39, 61, 61, 82, 82, 87, 91, 94, 96]

If we want n=3 groups, those groups are then:

[[2, 13, 34, 39], [61, 61, 82, 82], [87, 91, 94, 96]]

So the average of the minimums is avg(2,61,87)=50.

Can you do better than this? The answer is no.

Moving any number from one group A to another group B will decrease the minimum of A without correspondingly increasing the minimum of B.

For instance, you might think that moving 61 to a different group would help.

One possible rearrangement is:

[[2, 13, 34, 61], [39, 61, 82, 82], [87, 91, 94, 96]]

This rearrangement has a value of avg(2,39,87)=42.

Another possible rearrangement is:

[[2, 13, 34, 39], [87, 61, 82, 82], [61, 91, 94, 96]]

This rearrangement has a value of avg(2,61,61)=41.

So you see, we cannot do better by moving 61. Likewise, we cannot do better by moving any number.

Upvotes: 1

Anthony Blackshaw
Anthony Blackshaw

Reputation: 2549

It seems as if the simplest approach is to initially sort the list so that the lowest values are always grouped together, for example:

# Define the list of values to group
values = [1, 2, 3, 10, 11, 12]

# Sort the values
values.sort()

# Split the values down into an even number of `n` groups
no_groups = 3
group_size = len(values) / no_groups
groups = []

for i in range(0, no_groups):
    groups.append(values[0:(group_size)])
    values = values[group_size:]

# Calculate the average minimum value of the groups
average_min = float(sum([g[0] for g in groups])) / no_groups

print(average_min)

But given your mention of Jenks and K-means clustering I'm concerned this is too simplistic and that I'm missing something?

Upvotes: 2

Related Questions