Get combinations of multiple columns (Cartesian product) of a pandas dataframe?

Question

So I have a dataframe representing various model estimates for the likelihood of each of a group of candidates winning an election.

             Steve     John      
    Model1   0.327586  0.289474 
    Model2   0.322581  0.285714 
    Model3   0.303030  0.294118

I want a dataframe representing all of the combinations of model values across columns, i.e. Cartesian product of all columns. So the above would be transformed to the below.

             model Steve     value Steve    model John     value John     
    0        Model1          0.327586       Model1         0.289474
    1        Model1          0.327586       Model2         0.285714
    2        Model1          0.327586       Model3         0.294118
    3        Model2          0.322581       Model1         0.289474
    4        Model2          0.322581       Model2         0.285714
    5        Model2          0.322581       Model3         0.294118
    6        Model3          0.303030       Model1         0.289474
    7        Model3          0.303030       Model2         0.285714
    8        Model3          0.303030       Model3         0.294118

The above is the simple case, but in theory I would like to be able to do this for N models and M candidates, resulting in a dataframe with N^M rows and 2M columns (in practice N < 20, M < 6).

In searching for an answer I've seen a lot of recommendations for the itertools module for stuff like this, but couldn't figure out how to get all of the combinations across multiple lists (itertools.combinations seems to only work for finding all of the combinations within a single list).

DataPsycho · Accepted Answer

It is always better to provide the code so that we can create a frame quickly, not just a table. Any way you can create a common key and can do database like cross join to get the final result. You can do it in one line but I am doing step by step.

import pandas as pd


df = pd.DataFrame({'model': ['model1', 'model2'],
                   'steve': ['a', 'b'],
                   'jhon': ['c', 'd']
                  })

# create a common key

df['key'] = 'xyz'

# create two seperate dataframe for self join
# but it is possible to use the direct operation (right side) in 
# inside of merge funciton

df_steve = df [['model', 'steve', 'key']]
df_jhon = df [['model', 'jhon', 'key']]

# self join    
pd.merge(df_steve, df_jhon, on='key', suffixes=('_steve', '_jhon')).drop('key', axis=1)

output:

  model_steve steve model_jhon jhon
0      model1     a     model1    c
1      model1     a     model2    d
2      model2     b     model1    c
3      model2     b     model2    d

One liner Code:

cross_df = pd.merge(df[['model', 'steve', 'key']], 
                    df[['model', 'jhon', 'key']], 
                    on='key', 
                    suffixes=('_steve', '_jhon')
                    ).drop('key', axis=1)

just change the column name according to your need.

Get combinations of multiple columns (Cartesian product) of a pandas dataframe?

Answers (2)

Related Questions