Srimanth
Srimanth

Reputation: 13

Python: create combinations of two columns containing lists as their value in a dataframe

I have a dataframe with lists in its columns and I am trying to figure out most efficient way to find the combination of the two lists -

df = pd.DataFrame([[['a','b','c'],['l','m']],[['d','e','f'],['n','o']]],columns = ['col1','col2'])

Now the output in this case would be -

     col1   col2
0   [a, l]  [a, m]
1   [b, l]  [b, m]
2   [c, l]  [c, m]
3   [d, n]  [d, o]
4   [e, n]  [e, o]
5   [f, n]  [f, o]

I tried iterating through each row and then apply itertools.combinations. But it's crashing my system for higher number of rows in the dataframe. Can you please suggest me any efficient way to do this? Thanks in advance.

Upvotes: 0

Views: 110

Answers (2)

Henry Yik
Henry Yik

Reputation: 22503

You can also use itertools.product with numpy.reshape:

from itertools import product

print (pd.DataFrame(np.reshape([list(product(a,b))
                                for a,b in df.to_numpy()],
                               (-1,2,2)).tolist()))

        0       1
0  [a, l]  [a, m]
1  [b, l]  [b, m]
2  [c, l]  [c, m]
3  [d, n]  [d, o]
4  [e, n]  [e, o]
5  [f, n]  [f, o]

Upvotes: 1

sammywemmy
sammywemmy

Reputation: 28644

You can use itertools to get your output :

from itertools import product, chain, tee, islice

col1, col2 = tee(chain.from_iterable(product(col1, col2) 
                                     for col1, col2 
                                     in df.to_numpy()), 
                 2)

# Here we get alternate rows
col1 = islice(col1, None, None, 2)

col2 = islice(col2, 1, None, 2)

pd.DataFrame(zip(col1, col2), columns=["col1", "col2"])

    col1    col2
0   (a, l)  (a, m)
1   (b, l)  (b, m)
2   (c, l)  (c, m)
3   (d, n)  (d, o)
4   (e, n)  (e, o)
5   (f, n)  (f, o)

Upvotes: 1

Related Questions