Neobux
Neobux

Reputation: 35

Getting all row combinations from a pandas dataframe based on certain column conditions?

I have a Pandas Dataframe that stores a food item on each row in the following format -

Id   Calories   Protein   IsBreakfast   IsLunch   IsDinner
1      300        6           0           1          0    
2      400        12          1           1          0
.
.
.   
100    700        25          0           1          1            

I want to print all three-row combinations with the following conditions -

  1. The combinations should contain at least one of the breakfast, lunch, and dinner.
  2. sum of calories should be between certain range (say minCal < sum of calories in three rows < maxCal)
  3. similar condition for proteins too.

Right now, I am first iterating over all breakfast items, choosing lunch items. Then iterating over all dinner items. After selecting a combination, I am adding relevant columns and checking if values are within the desired range

Upvotes: 3

Views: 2779

Answers (2)

foglerit
foglerit

Reputation: 8269

You can use the approach described in this answer to generate a new DataFrame containing all the combinations of three rows from your original data:

from itertools import combinations
import pandas as pd

# Using skbrhmn's df
df = pd.DataFrame({"Calories": [100, 200, 300, 400, 500],
                   "Protein": [10, 20, 30, 40, 50],
                   "IsBreakfast": [1, 1, 0, 0, 0],
                   "IsLunch": [1, 0, 0, 0, 1],
                   "IsDinner": [1, 1, 1, 0, 1]})

comb_rows = list(combinations(df.index, 3))
comb_rows

Output:

[(0, 1, 2),
 (0, 1, 3),
 (0, 1, 4),
 (0, 2, 3),
 (0, 2, 4),
 (0, 3, 4),
 (1, 2, 3),
 (1, 2, 4),
 (1, 3, 4),
 (2, 3, 4)]

Then create a new DataFrame containing the sum of all numeric fields in your original frame, over all the possible combinations of three rows:

combinations = pd.DataFrame([df.loc[c,:].sum() for c in comb_rows], index=comb_rows)

print(combinations)

           Calories  Protein  IsBreakfast  IsLunch  IsDinner
(0, 1, 2)       600       60            2        1         3
(0, 1, 3)       700       70            2        1         2
(0, 1, 4)       800       80            2        2         3
(0, 2, 3)       800       80            1        1         2
(0, 2, 4)       900       90            1        2         3
(0, 3, 4)      1000      100            1        2         2
(1, 2, 3)       900       90            1        0         2
(1, 2, 4)      1000      100            1        1         3
(1, 3, 4)      1100      110            1        1         2
(2, 3, 4)      1200      120            0        1         2

Finally you can apply any filters you need:

filtered = combinations[
    (combinations.IsBreakfast>0) &
    (combinations.IsLunch>0) &
    (combinations.IsDinner>0) &
    (combinations.Calories>600) &
    (combinations.Calories<1000) &
    (combinations.Protein>=80) &
    (combinations.Protein<120)
]
print(filtered)

           Calories  Protein  IsBreakfast  IsLunch  IsDinner
(0, 1, 4)       800       80            2        2         3
(0, 2, 3)       800       80            1        1         2
(0, 2, 4)       900       90            1        2         3

Upvotes: 1

skbrhmn
skbrhmn

Reputation: 1199

You can add combinations of filters to a dataframe using the | and & operators. Creating a dummy dataframe for example:

df1 = pd.DataFrame({"Calories": [100, 200, 300, 400, 500],
                    "Protein": [10, 20, 30, 40, 50],
                    "IsBreakfast": [1, 1, 0, 0, 0],
                    "IsLunch": [1, 0, 0, 0, 1],
                    "IsDinner": [1, 1, 1, 0, 1]})
print(df1)

Output:

   Calories  Protein  IsBreakfast  IsLunch  IsDinner
0       100       10            1        1         1
1       200       20            1        0         1
2       300       30            0        0         1
3       400       40            0        0         0
4       500       50            0        1         1

Now add all the conditions:

min_cal = 100
max_cal = 600
min_prot = 10
max_prot = 40
df_filtered = df1[
    ((df1['IsBreakfast']==1) | (df1['IsLunch']==1) | (df1['IsDinner']==1)) &
    ((df1['Calories'] > min_cal) & (df1['Calories'] < max_cal)) &
    ((df1['Protein'] > min_prot) & (df1['Protein'] < max_prot))
]

print(df_filtered)

Output:

   Calories  Protein  IsBreakfast  IsLunch  IsDinner
1       200       20            1        0         1
2       300       30            0        0         1

Upvotes: 0

Related Questions