GEKKO Bin Balancing Optimization

I have a work-related problem that I am trying to solve using an optimization package like GEKKO. I am attempting to balance 3 bins of integers so that their sums are as close to each other as possible, if not equal.

The balancing occurs by moving integers from one bin to another.
These integers are either 1, 2, or 3 and each one has a unique identifier so we can trace in what bin it ends up.
The bins do not need to be the same length as when they started.
I cannot move part of an integer into a new bin (ex: if the bin is [2,3], I cannot move a 1 from it).
Lastly, I want to minimize the number of movements as well (ex: I would rather move a 3 from bin one than three 1's). I'm trying to think of this as a binary cost variable so if the integer stays in it's bin it costs nothing (0) but if it moves it costs 1.

I am attempting to write this in Python using GEKKO but this is my first attempt at using this type of package and a lot of it goes over my head. Ideally I want the rebalanced bins as an output to see how balancing was done. I've written what I have below to get started. It's not complete but I am just trying to get the concepts written down as I go.

Below are some example bins and ID's I have been trying to use:

bin1 = [1,2,2,1,1,1,3,1]
bin2 = [2,2,1,1,3,1,1]
bin3 = [2,1,2,1,1,1,1]

bin1_IDs=[15,28,39,111,202,300,411,415]
bin2_IDs=[2,10,56,117,132,189,367]
bin3_IDs=[44,102,211,227,351,389,406]

# Original bin contents and their sums
bins = [bin1, bin2, bin3]
original_sums = [sum(bin) for bin in bins]
num_bins = len(bins)
num_items = sum(len(bin) for bin in bins)

# Create GEKKO model
m = GEKKO(remote=False)

# Decision variables
# x[i][j] = 1 if item i is in bin j, 0 otherwise

# Create binary decision variables
x = [[m.Var(integer=True, lb=0, ub=1) for j in range(num_bins)] for i in range(num_items)]

# Ensure each item is assigned to exactly one bin
for i in range(num_items):
    m.Equation(sum(x[i][j] for j in range(num_bins)) == 1)

# Solve the optimization problem
m.solve(disp=True)

# Output the results
for j in range(num_bins):
    print(f"Bin {j+1} contents: ", end="")
    for i in range(num_items):
        if x[i][j].value[0] > 0.5:  # Item assigned to bin j
            print(items[i], end=" ")

Upvotes: 3

Answers (2)

Reinderien

Reputation: 15308

I worry about the ability of a binary integer program to scale. I propose an integer (but not binary) LP formulation. This will lose your ID information; your ID information isn't particularly meaningful to begin with. Think - the only meaningful part of a solution is, per source bin, destination bin and value, how many units move. That isn't binary, and it's not meaningful to select individual IDs as part of that solution.

import numpy as np
import pandas as pd
import pulp


def load_data() -> pd.DataFrame:
    # These integers are either 1, 2, or 3
    bin1 = (1, 2, 2, 1, 1, 1, 3, 1)
    bin2 = (2, 2, 1, 1, 3, 1, 1)
    bin3 = (2, 1, 2, 1, 1, 1, 1)
    # each one has a unique identifier so we can trace in what bin it ends up.
    bin1_ids = (15, 28, 39, 111, 202, 300, 411, 415)
    bin2_ids = (2, 10, 56, 117, 132, 189, 367)
    bin3_ids = (44, 102, 211, 227, 351, 389, 406)

    return pd.concat((
        pd.DataFrame(
            index=pd.Index(name='id', data=id_),
            data={'bin': bin_, 'value': value},
        )
        for bin_, (id_, value) in enumerate(zip(
            (bin1_ids, bin2_ids, bin3_ids),
            (bin1, bin2, bin3),
        ), start=1)
    ))


def make_vars(values: pd.DataFrame) -> tuple[
    pd.DataFrame,  # moves
    np.ndarray,    # bin IDs
    pd.DataFrame,  # old counts
    pd.Series,     # new counts
    pd.Series,     # new sums
    pulp.LpVariable,  # lowest sum
]:
    # Counts by bin and value in the original configuration
    old_counts = pd.DataFrame({
        'count': values.groupby(['bin', 'value'])['value'].count(),
    })
    bin_ids = old_counts.index.unique(level='bin')

    # Move amount combinations by old bin, new bin and value
    moves = pd.merge(
        left=old_counts.reset_index(), right=bin_ids.to_series(),
        how='cross', suffixes=('_old', '_new'),
    ).query('bin_old != bin_new')

    # Name strings for decision variables
    names = moves.agg(
        lambda row: f'{row["value"]}_from_{row["bin_old"]}_to_{row["bin_new"]}',
        axis='columns',
    )

    # Decision variables: move amounts by old bin, new bin and value
    moves['amount'] = pulp.LpVariable.matrix(
        name='n', indices=names, cat=pulp.LpInteger, lowBound=0,
    )
    move_sums = moves.groupby(['bin_old', 'value'])['amount'].sum()
    old_counts['move_from'] = move_sums

    moved_to = moves.groupby(['bin_new', 'value'])['amount'].sum()
    moved_to.index.names = 'bin', 'value'
    lhs, rhs = (old_counts['count'] - old_counts['move_from']).align(other=moved_to, fill_value=0)

    # Counts by bin and value in the new configuration
    new_counts = lhs + rhs
    new_counts.name = 'count'
    new_values = new_counts*new_counts.index.get_level_values('value')

    # Value sums by new bin
    new_sums = new_values.groupby('bin').sum()

    # For optimisation: value lower than or equal to all value sums by new bin
    lowest_sum = pulp.LpVariable(name='lowest_sum', cat=pulp.LpContinuous)

    return moves, bin_ids, old_counts, new_counts, new_sums, lowest_sum


def make_costs(
    values: pd.DataFrame,
    moves: pd.DataFrame,
    bin_ids: np.ndarray,
    old_counts: pd.DataFrame,
    lowest_sum: pulp.LpVariable,
) -> tuple[
    pulp.LpAffineExpression, pulp.LpAffineExpression,
]:
    # minimize the number of movements
    max_move_cost = old_counts['count'].sum()
    move_cost = 1/max_move_cost * pulp.lpSum(moves['amount'])

    value_mean = values['value'].sum() / len(bin_ids)
    balance_cost = value_mean - lowest_sum

    return move_cost, balance_cost


def make_problem(
    move_cost: pulp.LpAffineExpression, balance_cost: pulp.LpAffineExpression,
    move_weight: float = 1,
) -> pulp.LpProblem:
    prob = pulp.LpProblem(name='bins', sense=pulp.LpMinimize)
    prob.setObjective(move_weight*move_cost + balance_cost)
    return prob


def add_constraints(
    prob: pulp.LpProblem,
    old_counts: pd.DataFrame,
    new_sums: pd.Series,
    lowest_sum: pulp.LpVariable,
) -> None:
    # Moves cannot exceed the original count
    for (old_bin, value), row in old_counts.iterrows():
        prob.addConstraint(
            name=f'move_{value}_from_{old_bin}',
            constraint=row['move_from'] <= row['count'],
        )

    # Lowest sum must be below all bin sums
    for new_bin, total in new_sums.items():
        prob.addConstraint(
            name=f'balance_{new_bin}',
            constraint=lowest_sum <= total,
        )


def solve(
    prob: pulp.LpProblem,
    moves: pd.DataFrame,
    old_counts: pd.DataFrame,
    new_counts: pd.Series,
    new_sums: pd.Series,
    lowest_sum: pulp.LpVariable,
    move_cost: pulp.LpAffineExpression,
    balance_cost: pulp.LpAffineExpression,
) -> tuple[pd.Series, pd.Series, float, float, float]:
    prob.solve()
    if prob.status != pulp.LpStatusOptimal:
        raise ValueError(prob.status)

    moves['amount'] = moves['amount'].apply(pulp.value)
    old_counts['move_from'] = old_counts['move_from'].apply(pulp.value)
    new_counts = new_counts.apply(pulp.value)
    new_sums = new_sums.apply(pulp.value)
    lowest_sum = lowest_sum.value()
    move_cost = move_cost.value()
    balance_cost = balance_cost.value()

    return new_counts, new_sums, lowest_sum, move_cost, balance_cost


def dump(
    move_cost: float,
    balance_cost: float,
    values: pd.DataFrame,
    old_counts: pd.DataFrame,
    new_counts: pd.Series,
    new_sums: pd.Series,
    moves: pd.DataFrame,
) -> None:
    print(f'Move and balance costs: {move_cost:.3f}, {balance_cost:.3f}')
    print('Old bin sums')
    print(values.groupby('bin')['value'].sum())
    print('New bin sums')
    print(new_sums)
    print('Transform via')
    print(moves[moves['amount'] > 0.5])
    print('Transform from')
    print(old_counts)
    print('Transform to')
    print(new_counts)


def main() -> None:
    values = load_data()
    moves, bin_ids, old_counts, new_counts, new_sums, lowest_sum = make_vars(values=values)
    move_cost, balance_cost = make_costs(
        values=values, moves=moves, bin_ids=bin_ids, old_counts=old_counts, lowest_sum=lowest_sum,
    )
    prob = make_problem(move_cost=move_cost, balance_cost=balance_cost)
    add_constraints(prob=prob, old_counts=old_counts, new_sums=new_sums, lowest_sum=lowest_sum)
    print(prob)

    new_counts, new_sums, lowest_sum, move_cost, balance_cost = solve(
        prob=prob, moves=moves, old_counts=old_counts, new_counts=new_counts, new_sums=new_sums,
        lowest_sum=lowest_sum, move_cost=move_cost, balance_cost=balance_cost,
    )
    dump(
        move_cost=move_cost, balance_cost=balance_cost,
        values=values, moves=moves, old_counts=old_counts, new_counts=new_counts, new_sums=new_sums,
    )
    print()


if __name__ == '__main__':
    main()

bins:
MINIMIZE
-1*lowest_sum + 0.045454545454545456*n_1_from_1_to_2 + 0.045454545454545456*n_1_from_1_to_3 ...

SUBJECT TO
move_1_from_1: n_1_from_1_to_2 + n_1_from_1_to_3 <= 5

move_2_from_1: n_2_from_1_to_2 + n_2_from_1_to_3 <= 2

move_3_from_1: n_3_from_1_to_2 + n_3_from_1_to_3 <= 1

move_1_from_2: n_1_from_2_to_1 + n_1_from_2_to_3 <= 4

move_2_from_2: n_2_from_2_to_1 + n_2_from_2_to_3 <= 2

move_3_from_2: n_3_from_2_to_1 + n_3_from_2_to_3 <= 1

move_1_from_3: n_1_from_3_to_1 + n_1_from_3_to_2 <= 5

move_2_from_3: n_2_from_3_to_1 + n_2_from_3_to_2 <= 2

balance_1: lowest_sum + n_1_from_1_to_2 + n_1_from_1_to_3 - n_1_from_2_to_1
 - n_1_from_3_to_1 + 2 n_2_from_1_to_2 + 2 n_2_from_1_to_3 - 2 n_2_from_2_to_1
 - 2 n_2_from_3_to_1 + 3 n_3_from_1_to_2 + 3 n_3_from_1_to_3
 - 3 n_3_from_2_to_1 <= 12

balance_2: lowest_sum - n_1_from_1_to_2 + n_1_from_2_to_1 + n_1_from_2_to_3
 - n_1_from_3_to_2 - 2 n_2_from_1_to_2 + 2 n_2_from_2_to_1 + 2 n_2_from_2_to_3
 - 2 n_2_from_3_to_2 - 3 n_3_from_1_to_2 + 3 n_3_from_2_to_1
 + 3 n_3_from_2_to_3 <= 11

balance_3: lowest_sum - n_1_from_1_to_3 - n_1_from_2_to_3 + n_1_from_3_to_1
 + n_1_from_3_to_2 - 2 n_2_from_1_to_3 - 2 n_2_from_2_to_3 + 2 n_2_from_3_to_1
 + 2 n_2_from_3_to_2 - 3 n_3_from_1_to_3 - 3 n_3_from_2_to_3 <= 9

VARIABLES
lowest_sum free Continuous
0 <= n_1_from_1_to_2 Integer
0 <= n_1_from_1_to_3 Integer
0 <= n_1_from_2_to_1 Integer
0 <= n_1_from_2_to_3 Integer
0 <= n_1_from_3_to_1 Integer
0 <= n_1_from_3_to_2 Integer
0 <= n_2_from_1_to_2 Integer
0 <= n_2_from_1_to_3 Integer
0 <= n_2_from_2_to_1 Integer
0 <= n_2_from_2_to_3 Integer
0 <= n_2_from_3_to_1 Integer
0 <= n_2_from_3_to_2 Integer
0 <= n_3_from_1_to_2 Integer
0 <= n_3_from_1_to_3 Integer
0 <= n_3_from_2_to_1 Integer
0 <= n_3_from_2_to_3 Integer

Move and balance costs: 0.045, 0.667

Old bin sums
bin
1    12
2    11
3     9
Name: value, dtype: int64

New bin sums
bin
1    12.0
2    10.0
3    10.0
dtype: float64

Transform via
    bin_old  value  count  bin_new  amount
11        2      1      4        3     1.0

Transform from
           count  move_from
bin value                  
1   1          5        0.0
    2          2        0.0
    3          1        0.0
2   1          4        1.0
    2          2        0.0
    3          1        0.0
3   1          5        0.0
    2          2        0.0

Transform to
bin  value
1    1        5.0
     2        2.0
     3        1.0
2    1        3.0
     2        2.0
     3        1.0
3    1        6.0
     2        2.0
     3        0.0
Name: count, dtype: float64

Upvotes: 0

AirSquid

Reputation: 11938

Here's a hack at this using pyomo which is an alternative package for linear programming. I'm sure this could be transcribed into GEKKO if that is required, I'm just not as familiar with the syntax of GEKKO.

This is basically a large Binary Integer Program, where the binary variables represent the move (or non-move) of each item, so the key variable is x[item, orig, final] where the pairs of orig, final are reduced to only what is possible for that item to more tightly constrain the problem.

The key metric for the objective is a mini-max where we reduce the value of the highest bin and therefore the bin values get "squished" evenly. A slight penalty is added for moves, which must be weighted carefully wrt the max bin value.

Of note, if your "real problem" is vastly larger than this, other techniques may be required as this may not scale to huge problem sizes, but it solves this one in a blink

Code:

import pyomo.environ as pyo

# DATA
bin1 = [1, 2, 2, 1, 1, 1, 3, 1]
bin2 = [2, 2, 1, 1, 3, 1, 1]
bin3 = [2, 1, 2, 1, 1, 1, 1]
# bin1 = [0, 0, 0, 0, 0, 0, 0, 0]

bin1_IDs = [15, 28, 39, 111, 202, 300, 411, 415]
bin2_IDs = [2, 10, 56, 117, 132, 189, 367]
bin3_IDs = [44, 102, 211, 227, 351, 389, 406]

values = {}
values.update(dict(zip(bin1_IDs, bin1)))
values.update(dict(zip(bin2_IDs, bin2)))
values.update(dict(zip(bin3_IDs, bin3)))

orig_loc = {}
orig_loc.update({item: "b1" for item in bin1_IDs})
orig_loc.update({item: "b2" for item in bin2_IDs})
orig_loc.update({item: "b3" for item in bin3_IDs})

bins = ["b1", "b2", "b3"]
poss_moves = {
    item: {(orig, other) for other in bins} for item, orig in orig_loc.items()
}

all_moves = {(b, bb) for b in bins for bb in bins}
non_moves = {(b, b) for b in bins}


# MODEL
m = pyo.ConcreteModel("bin_shuffler")

# SETS
m.I = pyo.Set(initialize=values.keys(), doc="items")
m.B = pyo.Set(initialize=bins, doc="bin")
m.M = pyo.Set(within=m.B * m.B, initialize=all_moves, doc="moves")

# indexed set of the item : possible moves for that item
m.IM = pyo.Set(m.I, domain=m.M, initialize=poss_moves, doc="possible moves")
# a "flat" set of same for use in variable indexing
m.IM_flat = pyo.Set(
    domain=m.I * m.M, initialize={(item, move) for item in m.IM for move in m.IM[item]}
)

# PARAMS
m.value = pyo.Param(m.I, initialize=values)
move_cost = 0.01  # very small weight such that it never eclipses the max_bin value objective

# VARS
# item i makes move m
m.x = pyo.Var(m.IM_flat, domain=pyo.Binary)
m.max_value = pyo.Var()
m.move_count = pyo.Var()

# OBJ
# mini-max:  minimize the max sum of largest bin, and minimize moves w/ small penalty
m.obj = pyo.Objective(expr=m.max_value + move_cost * m.move_count, sense=pyo.minimize)

# CONSTRAINTS

# all items must make a move
@m.Constraint(m.I)
def make_move(m, i):
    return sum(m.x[i, move] for move in m.IM[i]) == 1


# count all of the moves that are not non-moves
m.set_move_count = pyo.Constraint(
    expr=m.move_count
    >= sum(m.x[i, move] for i in m.I for move in m.IM[i] if move not in non_moves)
)


# catch the max bin value
@m.Constraint(m.B)
def max_bin(m, bin):
    return m.max_value >= sum(
        m.value[i] * m.x[i, orig, final]
        for i in m.I
        for (orig, final) in m.IM[i]
        if final == bin
    )

# inspect it...
m.pprint()

# solve it...
opt = pyo.SolverFactory("appsi_highs")
res = opt.solve(m, tee=True)

# report it...
if pyo.check_optimal_termination(res):
    print(res)
    m.display()
    for i in m.I:
        for orig, final in m.IM[i]:
            if pyo.value(m.x[i, orig, final]) > 0.5 and orig != final:
                print(f"move {i} from {orig} to {final}")

    for b in m.B:
        # get the sum
        bin_total = sum(
            m.value[i] * pyo.value(m.x[i, orig, final])
            for i in m.I
            for (orig, final) in m.IM[i]
            if final == b
        )
        print(f"total value of bin {b}: {bin_total}")
else:
    print("solve failed")

final result (which is not unique):

move 415 from b1 to b3
total value of bin b1: 11.0
total value of bin b2: 11.0
total value of bin b3: 10.0

Upvotes: 0

GEKKO Bin Balancing Optimization

Answers (2)

Code:

final result (which is not unique):

Related Questions