Reputation: 229
I have a dataframe:
id col1 col2
0 1000 250
1 2000 750
2 1500 350
3 3000 800
4 4500 2500
5 8500 4450
6 6300 1250
I'm trying to find the rows where I can maximize the sum of col2
values, based on/given the total sum for those rows' col1
is <= 15000.
What would be the easiest way to do so?
Upvotes: 3
Views: 186
Reputation: 743
As suggested in comments it might be knapsack problem, but I tried to implement what I understand from your requirement below:
Using powerset
from itertools
with pd.concat
.
from itertools import chain, combinations
def powerset(iterable):
"""powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"""
s = list(iterable)
print(s)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in
enumerate(powerset(df.index))
if ((df.loc[list(l), 'col1'].sum() <= 1500))])
print(df_groups)
Output:
id col1 col2 grp
0 0 1000 250 1
2 2 1500 350 3
Explanation:
We are using the index of the dataframe to create groups of rows using powerset
function. Next, we are using enumerate
to identify each group and with assign
we are creating a new column in a dataframe with that group number from enumerate.
Then what we get is groups that satisfy the condition
, where the sum of col1.values <= 15000
in that particular group
.
Reference: stackoverflow.com/questions/58119575
Upvotes: 1