Reputation: 161
So I have a dataframe representing various model estimates for the likelihood of each of a group of candidates winning an election.
Steve John
Model1 0.327586 0.289474
Model2 0.322581 0.285714
Model3 0.303030 0.294118
I want a dataframe representing all of the combinations of model values across columns, i.e. Cartesian product of all columns. So the above would be transformed to the below.
model Steve value Steve model John value John
0 Model1 0.327586 Model1 0.289474
1 Model1 0.327586 Model2 0.285714
2 Model1 0.327586 Model3 0.294118
3 Model2 0.322581 Model1 0.289474
4 Model2 0.322581 Model2 0.285714
5 Model2 0.322581 Model3 0.294118
6 Model3 0.303030 Model1 0.289474
7 Model3 0.303030 Model2 0.285714
8 Model3 0.303030 Model3 0.294118
The above is the simple case, but in theory I would like to be able to do this for N models and M candidates, resulting in a dataframe with N^M rows and 2M columns (in practice N < 20, M < 6).
In searching for an answer I've seen a lot of recommendations for the itertools
module for stuff like this, but couldn't figure out how to get all of the combinations across multiple lists (itertools.combinations
seems to only work for finding all of the combinations within a single list).
Upvotes: 2
Views: 2891
Reputation: 863801
Use:
from itertools import product
#get all combinations of all columns
a = product(*[zip(df.index, x) for x in df.T.values])
#create new columns names
cols = [c for x in df.columns for c in ('model_' + x, 'value_' + x)]
#flattening nested lists with DataFrame contructor
df1 = pd.DataFrame([[y for x in z for y in x] for z in a], columns=cols)
print (df1)
model_Steve value_Steve model_John value_John
0 Model1 0.327586 Model1 0.289474
1 Model1 0.327586 Model2 0.285714
2 Model1 0.327586 Model3 0.294118
3 Model2 0.322581 Model1 0.289474
4 Model2 0.322581 Model2 0.285714
5 Model2 0.322581 Model3 0.294118
6 Model3 0.303030 Model1 0.289474
7 Model3 0.303030 Model2 0.285714
8 Model3 0.303030 Model3 0.294118
Upvotes: 4
Reputation: 988
It is always better to provide the code so that we can create a frame quickly, not just a table. Any way you can create a common key
and can do database like cross join to get the final result. You can do it in one line but I am doing step by step.
import pandas as pd
df = pd.DataFrame({'model': ['model1', 'model2'],
'steve': ['a', 'b'],
'jhon': ['c', 'd']
})
# create a common key
df['key'] = 'xyz'
# create two seperate dataframe for self join
# but it is possible to use the direct operation (right side) in
# inside of merge funciton
df_steve = df [['model', 'steve', 'key']]
df_jhon = df [['model', 'jhon', 'key']]
# self join
pd.merge(df_steve, df_jhon, on='key', suffixes=('_steve', '_jhon')).drop('key', axis=1)
output:
model_steve steve model_jhon jhon
0 model1 a model1 c
1 model1 a model2 d
2 model2 b model1 c
3 model2 b model2 d
One liner Code:
cross_df = pd.merge(df[['model', 'steve', 'key']],
df[['model', 'jhon', 'key']],
on='key',
suffixes=('_steve', '_jhon')
).drop('key', axis=1)
just change the column name according to your need.
Upvotes: 2