Reputation: 1334
I am trying to find an effective / pythonic way to achieve the following:
- for each row in df, generate the cartesian product between Var1
, Var2
and Var3
.
- each tuple of the cartesian product would be a new row in a dataframe
- for each rows, insert the other columns in the original dataframe (in my example that would be Id
and Grp
I know that I can get the cartesian product with something like list(itertools.product(a, b, c))
, but I have no idea how to do this in a dataframe environment. I could do this in a for loop, but the production dataframe is likely to have over 100k rows, so I'm thinking efficiency first.
Python version used: 3.6+
Input dataframe:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Id': [1,2],
'Grp': ['A','B'],
'Var1': [[11,12],[11,19]],
'Var2': [[0],[20,25]],
'Var3': [[34],[31,35,38]]
})
Expected output (either CartesianProduct
or Var1, Var2, Var3
is fine, I don't need both):
CartesianProduct Grp Id Var1 Var2 Var3
0 (11, 0, 34) A 1 11 0 34
1 (12, 0, 34) A 1 12 0 34
2 (11, 20, 31) B 2 11 20 31
3 (11, 20, 35) B 2 11 20 35
4 (11, 20, 38) B 2 11 20 38
5 (11, 25, 31) B 2 11 25 31
6 (11, 25, 35) B 2 11 25 35
7 (11, 25, 38) B 2 11 25 38
8 (19, 20, 31) B 2 19 20 31
9 (19, 20, 35) B 2 19 20 35
10 (19, 20, 38) B 2 19 20 38
11 (19, 25, 31) B 3 19 25 31
12 (19, 25, 35) B 4 19 25 35
13 (19, 25, 38) B 5 19 25 38
Upvotes: 1
Views: 336
Reputation: 164843
Here is one way with itertools
:
import numpy as np
import pandas as pd
from itertools import product, chain
df = pd.DataFrame({'Id': [1,2],
'Grp': ['A','B'],
'Var1': [[11,12],[11,19]],
'Var2': [[0],[20,25]],
'Var3': [[34],[31,35,38]]})
df['Cart'] = [list(product(*x)) for x in df[['Var1','Var2','Var3']].values.tolist()]
lens = list(map(len, df['Cart']))
df_out = pd.DataFrame({'Grp': np.repeat(df['Grp'].values, lens),
'Id': np.repeat(df['Id'].values, lens),
'Cart': list(chain(*df['Cart'].values))})
df_out = df_out.join(pd.DataFrame(df_out['Cart'].tolist(),
columns=['Var1', 'Var2', 'Var3']))
# Cart Grp Id Var1 Var2 Var3
# 0 (11, 0, 34) A 1 11 0 34
# 1 (12, 0, 34) A 1 12 0 34
# 2 (11, 20, 31) B 2 11 20 31
# 3 (11, 20, 35) B 2 11 20 35
# 4 (11, 20, 38) B 2 11 20 38
# 5 (11, 25, 31) B 2 11 25 31
# 6 (11, 25, 35) B 2 11 25 35
# 7 (11, 25, 38) B 2 11 25 38
# 8 (19, 20, 31) B 2 19 20 31
# 9 (19, 20, 35) B 2 19 20 35
# 10 (19, 20, 38) B 2 19 20 38
# 11 (19, 25, 31) B 2 19 25 31
# 12 (19, 25, 35) B 2 19 25 35
# 13 (19, 25, 38) B 2 19 25 38
Upvotes: 1