Reputation: 1393
I had asked a similar question earlier, but I'm looking for a different output.
Create a dataframe of permutations in pandas from list
My list is as follows:
aa = ['aa1', 'aa2', 'aa3', 'aa4', 'aa5']
bb = ['bb1', 'bb2', 'bb3', 'bb4', 'bb5']
cc = ['cc1', 'cc2', 'cc3', 'cc4', 'cc5']
Now I want to create a dataframe as follows:
aa bb cc
aa1 bb1 cc1
aa2 bb1 cc1
aa3 bb1 cc1
aa4 bb1 cc1
aa5 bb1 cc1
aa1 bb2 cc1
aa1 bb3 cc1
aa1 bb4 cc1
aa1 bb5 cc1
aa1 bb1 cc2
aa1 bb1 cc3
aa1 bb1 cc4
aa1 bb1 cc5
The previous suggestion I received was to use:
lists = [aa, bb, cc]
pd.DataFrame(list(itertools.product(*lists)), columns=['aa', 'bb', 'cc'])
Which gives me a cartesian product.
But this time, it's not quite what I'm looking for. I want the output to be exactly like the example output above. - So each element in the list, only appears once in each column, except for the first element, which is duplicated to fill the entire column.
Appreciate any help!
Upvotes: 0
Views: 248
Reputation: 249374
First construct the repeating parts:
index = pd.RangeIndex(len(aa) + len(bb) + len(cc))
df = pd.DataFrame({'aa':aa[0], 'bb':bb[0], 'cc':cc[0]}, index)
That gives you 15 copies of:
aa1 bb1 cc1
Then overwrite the varying parts:
df.aa[:len(aa)] = aa
df.bb[len(aa):len(aa)+len(bb)] = bb
df.cc[len(aa)+len(bb):] = cc
Which gives the desired output.
Upvotes: 1