Woodyinho11
Woodyinho11

Reputation: 13

Performing Permutations and Combinations

so I have 6 Data Frames, each with 45 columns with the same headers. I give an example of 3 of them below:

df1

Name Attr1 Attr2 ... Attr44
'Apple' 10 -15 ... 10
'Orange' 20 -20 ... 0

df2

Name Attr1 Attr2 ... Attr44
'Cabbage' 40 -25 ... 10
'Lettuce' 0 -35 ... 15

df3

Name Attr1 Attr2 ... Attr44
'Sugar' 17 -15 ... 8
'Salt' 19 0 ... 12

Basically, the first column contains str data, and the subsequent 44 columns are numpy.int64. I am attempting to create a separate Data Frame which is populated by all possible combinations of the sum of the different rows of each Data Frames, for e.g.:

Combined

Name Attr1 Attr2 ... Attr44
'AppleCabbageSugar' 67 -55 ... 28
'AppleCabbageSalt' 29 -40 ... 32
'AppleLettuceSugar' 27 -65 ... 33
'AppleLettuceSalt' 29 -50 ... 37
'OrangeCabbageSugar' 77 -60 ... 33
...
'OrangeLettuceSalt' 39 -55 ... 27

i.e.:
For the first row of Combined:
the 'Dragon SwordDragon ArmourDragon Boots' for Name is given by the sum of (Name of the first row of df1, Name of the first row of df2, Name of the first row of df3)
the 67 for Attr1 is given by the sum of (Attr1 of the first row of df1, Attr1 of the first row of df2, Attr1 of the first row of df3)
and so on...

For the second row of Combined:
the 'Dragon SwordDragon ArmourAres Boots' for Name is given by the sum of (Name of the first row of df1, Name of the first row of df2, Name of the second row of df3)
the 29 for Attr1 is given by the sum of (Attr1 of the first row of df1, Attr1 of the first row of df2, Attr1 of the second row of df3)
and so on...

So far, this is the roughly the code I have been using:

count = 0
for idx1 in range(df1.shape[0]):
    for idx2 in range(df2.shape[0]):
        for idx3 in range(df3.shape[0]):
            new_df.iloc[count] = df1.iloc[idx1] + df2.iloc[idx2] + df3.iloc[idx3]
            count += 1

This seems to work fine when I am only summing 2 Data Frames this way. However, when I expand it to 3 and beyond, the order that the columns are returned is messed up. For example, the Name column which is supposed to be in the first column, ends up in the 10th column.

Could someone explain to me what is wrong here, what I can do, or if there are better ways to perform such a task? Thank you very much.

P.S. it is my first post here so I apologise if the write-up is not up to the standard or requirements of the community

Upvotes: 1

Views: 94

Answers (1)

BeRT2me
BeRT2me

Reputation: 13242

Merging with cross gives the "cartesian product of rows of both frames." Doing this twice gives us the product of all three.

I've used filter(regex=f'^{col}(_|$)') here to group like columns together... same columns will either have the same name, or be the same name with a suffix split with _.

cols = df1.columns # Assuming all three dfs have the same columns.
df = df1.merge(df2, 'cross').merge(df3, 'cross')
for col in cols:
    # We can sum both strings and number columns.
    df[col] = df.filter(regex=f'^{col}(_|$)').sum(axis=1)

df = df[cols]
print(df)

Output:

                 Name  Attr1  Attr2
0   AppleCabbageSugar     67    -55
1    AppleCabbageSalt     69    -40
2   AppleLettuceSugar     27    -65
3    AppleLettuceSalt     29    -50
4  OrangeCabbageSugar     77    -60
5   OrangeCabbageSalt     79    -45
6  OrangeLettuceSugar     37    -70
7   OrangeLettuceSalt     39    -55

Upvotes: 1

Related Questions