Reputation: 229

Pandas maximize total value of a column based on another column total sum max

I have a dataframe:

id col1  col2
0  1000   250
1  2000   750
2  1500   350
3  3000   800
4  4500  2500
5  8500  4450
6  6300  1250

I'm trying to find the rows where I can maximize the sum of col2 values, based on/given the total sum for those rows' col1 is <= 15000.

What would be the easiest way to do so?

Upvotes: 3

Answers (1)

Priya

Reputation: 743

As suggested in comments it might be knapsack problem, but I tried to implement what I understand from your requirement below:

Using powerset from itertools with pd.concat.

from itertools import chain, combinations

def powerset(iterable):
   """powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"""
   s = list(iterable)
   print(s)
   return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

df_groups = pd.concat([df.reindex(l).assign(grp=n) for n, l in 
                   enumerate(powerset(df.index))
                  if ((df.loc[list(l), 'col1'].sum() <= 1500))])

print(df_groups)

Output:

     id  col1   col2  grp
 0   0   1000   250   1
 2   2   1500   350   3

Explanation:

We are using the index of the dataframe to create groups of rows using powerset function. Next, we are using enumerate to identify each group and with assign we are creating a new column in a dataframe with that group number from enumerate. Then what we get is groups that satisfy the condition, where the sum of col1.values <= 15000 in that particular group.

Reference: stackoverflow.com/questions/58119575

Upvotes: 1

Pandas maximize total value of a column based on another column total sum max

Answers (1)

Related Questions