Reputation: 13
I have a dataframe with lists in its columns and I am trying to figure out most efficient way to find the combination of the two lists -
df = pd.DataFrame([[['a','b','c'],['l','m']],[['d','e','f'],['n','o']]],columns = ['col1','col2'])
Now the output in this case would be -
col1 col2
0 [a, l] [a, m]
1 [b, l] [b, m]
2 [c, l] [c, m]
3 [d, n] [d, o]
4 [e, n] [e, o]
5 [f, n] [f, o]
I tried iterating through each row and then apply itertools.combinations. But it's crashing my system for higher number of rows in the dataframe. Can you please suggest me any efficient way to do this? Thanks in advance.
Upvotes: 0
Views: 110
Reputation: 22503
You can also use itertools.product
with numpy.reshape
:
from itertools import product
print (pd.DataFrame(np.reshape([list(product(a,b))
for a,b in df.to_numpy()],
(-1,2,2)).tolist()))
0 1
0 [a, l] [a, m]
1 [b, l] [b, m]
2 [c, l] [c, m]
3 [d, n] [d, o]
4 [e, n] [e, o]
5 [f, n] [f, o]
Upvotes: 1
Reputation: 28644
You can use itertools to get your output :
from itertools import product, chain, tee, islice
col1, col2 = tee(chain.from_iterable(product(col1, col2)
for col1, col2
in df.to_numpy()),
2)
# Here we get alternate rows
col1 = islice(col1, None, None, 2)
col2 = islice(col2, 1, None, 2)
pd.DataFrame(zip(col1, col2), columns=["col1", "col2"])
col1 col2
0 (a, l) (a, m)
1 (b, l) (b, m)
2 (c, l) (c, m)
3 (d, n) (d, o)
4 (e, n) (e, o)
5 (f, n) (f, o)
Upvotes: 1