Create a dataframe of permutations in pandas from lists

Question

I had asked a similar question earlier, but I'm looking for a different output.

Create a dataframe of permutations in pandas from list

My list is as follows:

aa = ['aa1', 'aa2', 'aa3', 'aa4', 'aa5']
bb = ['bb1', 'bb2', 'bb3', 'bb4', 'bb5']
cc = ['cc1', 'cc2', 'cc3', 'cc4', 'cc5']

Now I want to create a dataframe as follows:

aa    bb    cc
aa1   bb1   cc1
aa2   bb1   cc1
aa3   bb1   cc1
aa4   bb1   cc1
aa5   bb1   cc1
aa1   bb2   cc1
aa1   bb3   cc1
aa1   bb4   cc1
aa1   bb5   cc1
aa1   bb1   cc2
aa1   bb1   cc3
aa1   bb1   cc4
aa1   bb1   cc5

The previous suggestion I received was to use:

lists = [aa, bb, cc]
pd.DataFrame(list(itertools.product(*lists)), columns=['aa', 'bb', 'cc'])

Which gives me a cartesian product.

But this time, it's not quite what I'm looking for. I want the output to be exactly like the example output above. - So each element in the list, only appears once in each column, except for the first element, which is duplicated to fill the entire column.

Appreciate any help!

John Zwinck · Accepted Answer

First construct the repeating parts:

index = pd.RangeIndex(len(aa) + len(bb) + len(cc))
df = pd.DataFrame({'aa':aa[0], 'bb':bb[0], 'cc':cc[0]}, index)

That gives you 15 copies of:

aa1   bb1   cc1

Then overwrite the varying parts:

df.aa[:len(aa)] = aa
df.bb[len(aa):len(aa)+len(bb)] = bb
df.cc[len(aa)+len(bb):] = cc

Which gives the desired output.

Create a dataframe of permutations in pandas from lists

Answers (1)

Related Questions