Reputation: 755
I have the following dataframe with weights:
df = pd.DataFrame({'a': [0.1, 0.5, 0.1, 0.3], 'b': [0.2, 0.4, 0.2, 0.2], 'c': [0.3, 0.2, 0.4, 0.1],
'd': [0.1, 0.1, 0.1, 0.7], 'e': [0.2, 0.1, 0.3, 0.4], 'f': [0.7, 0.1, 0.1, 0.1]})
and then I normalize each row using:
df = df.div(df.sum(axis=1), axis=0)
I want to optimize the normalized weights of each row such that no weight is less than 0 or greater than 0.4.
If the weight is greater than 0.4, it will be clipped to 0.4 and the additional weight will be distributed to the other entries in a pro-rata fashion (meaning the second largest weight will receive more weight so it gets close to 0.4, and if there is any remaining weight, it will be distributed to the third and so on).
Can this be done using the "optimize" function?
Thank you.
UPDATE: I would also like to set a minimum bound for the weights. In my original question, the minimum weight bound was automatically considered as zero, however, I would like to set a constraint such that the minimum weight is at at least equal to 0.05, for example.
Upvotes: 1
Views: 690
Reputation: 93171
Unfortunately, I can only find a loop solution to this problem. When you trim off the excess weight and redistribute it proportionally, the underweight may go over the limit. Then they have to be trimmed off. And the cycle keep repeating until no value is overweight. The same goes for underweight rows.
# The original data frame. No normalization yet
df = pd.DataFrame(
{
"a": [0.1, 0.5, 0.1, 0.3],
"b": [0.2, 0.4, 0.2, 0.2],
"c": [0.3, 0.2, 0.4, 0.1],
"d": [0.1, 0.1, 0.1, 0.7],
"e": [0.2, 0.1, 0.3, 0.4],
"f": [0.7, 0.1, 0.1, 0.1],
}
)
def ensure_min_weight(row: np.array, min_weight: float):
while True:
underweight = row < min_weight
if not underweight.any():
break
missing_weight = min_weight * underweight.sum() - row[underweight].sum()
row[~underweight] -= missing_weight / row[~underweight].sum() * row[~underweight]
row[underweight] = min_weight
def ensure_max_weight(row: np.array, max_weight: float):
while True:
overweight = row > max_weight
if not overweight.any():
break
excess_weight = row[overweight].sum() - (max_weight * overweight.sum())
row[~overweight] += excess_weight / row[~overweight].sum() * row[~overweight]
row[overweight] = max_weight
values = df.to_numpy()
normalized = values / values.sum(axis=1)[:, None]
min_weight = 0.15 # just for fun
max_weight = 0.4
for i in range(len(values)):
row = normalized[i]
ensure_min_weight(row, min_weight)
ensure_max_weight(row, max_weight)
# Normalized weight
assert np.isclose(normalized.sum(axis=1), 1).all(), "Normalized weight must sum up to 1"
assert ((min_weight <= normalized) & (normalized <= max_weight)).all(), f"Normalized weight must be between {min_weight} and {max_weight}"
print(pd.DataFrame(normalized, columns=df.columns))
# Raw values
# values = normalized * values.sum(axis=1)[:, None]
# print(pd.DataFrame(values, columns=df.columns))
Note that this algorithm will run into infinite loop if your min_weight
and max_weight
are illogical: try min_weight = 0.4
and max_weight = 0.5
. You should handle these errors in the 2 ensure
functions.
Upvotes: 1