ImAUser
ImAUser

Reputation: 129

How to create a dataframe containing lthe cartesian product between two lists as elements?

I have a quite big dataframe containing 3 columns, each of which contains a list. Each list can be arbitrarily long (and is, sometimes, quite long).

 a          Yes                 No  
[1, 2, 3]  ["a", "b"]        ["A", "B", "C"]
[7, 11, 6] ["a", "d", "f"]   ["C", "H", "L", "Z"]

I want to create a new dataframe which looks like this:

C1    C2    value 
 1     a      1 
 1     b      1
 2     a      1
 2     b      1
...
 3     b      1
 1     A      0
 1     B      0
 1     C      0
...
 3     C      0
...
 7     a      1
 7     d      1
...
 6     Z      0      

The order of the rows does not matter. I'm looking for an efficient way of doing that.

What I'm doing at the moment is the following (I might be complicating things a bit):

new_df = pd.DataFrame(columns=["C1", "C2", "value"])
def compute_pairs(df1, new_df):
  perm = list(itertools.product(df1.a, df1.Yes))
  for p in perm:
    new_df.loc[len(new_df)] = list(p) + [1]
  perm = list(itertools.product(df1.a, df1.No))
  for p in perm:
    df.loc[len(new_df)] = list(p) + [0]

df1.apply(lambda x: compute_pairs(x, new_df), axis=1)

However, the for look is quite slow. I tried to use map but I failed.

Any improvements?

Upvotes: 0

Views: 41

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150825

Let's try melt with explode:

(df.melt('a',var_name='value',value_name='C')
   .explode('C').explode('a')
)

Output:

    a value  C
0   1   Yes  a
0   2   Yes  a
0   3   Yes  a
0   1   Yes  b
0   2   Yes  b
0   3   Yes  b
1   7   Yes  a
1  11   Yes  a
1   6   Yes  a
1   7   Yes  d
1  11   Yes  d
1   6   Yes  d
1   7   Yes  f
1  11   Yes  f
1   6   Yes  f
2   1    No  A
2   2    No  A
2   3    No  A
2   1    No  B
2   2    No  B
2   3    No  B
2   1    No  C
2   2    No  C
2   3    No  C
3   7    No  C
3  11    No  C
3   6    No  C
3   7    No  H
3  11    No  H
3   6    No  H
3   7    No  L
3  11    No  L
3   6    No  L
3   7    No  Z
3  11    No  Z
3   6    No  Z

Upvotes: 1

Related Questions