machina12345
machina12345

Reputation: 19

Creation of a multi-column, random-order dataframe from a series in Pandas

For a specified single-column dataframe, is it possible purely from pandas calls to chronologically split into multiple columns of length n once a randomised order is created?

df = pd.read_csv('info.csv', low_memory=False, index_col=0)
df.head(5)

Which initially reads as:

   list
0  A
1  B
2  C
3  D
4  E

Then in order to randomise the order:

df = df.apply(np.random.permutation)
df.head(5)

Which then reads as:

   list
0  C
1  E
2  A
3  B
4  D

I have attempted using a modified version of the call below, yet not entirely sure if appropriate:

df = pd.DataFrame([list[n:n+2] for n in range(0, len(list), 2)], columns=columnNames)

I would like a finalised dataframe of the format below, whereby in this case the length is 3 rows:

   list1  list2 ... listn
0    C      B        ...
1    E      D        ...
2    A     ...       ...

Is this possible purely from a single line pandas query?

Thanks in advance!

Upvotes: 1

Views: 35

Answers (1)

jezrael
jezrael

Reputation: 863166

You can use dictionary comprehension with Series for possible create DataFrame with genersl lengths of Series:

L = np.random.permutation(df['list'])
N = 3
df = (pd.DataFrame({i: pd.Series(L[n:n+N]) for i,n in enumerate(range(0, len(L), N))})
       .add_prefix('list'))
print (df)
  list0 list1
0     A     D
1     C     B
2     E   NaN

Nnon loop solution, if faster best test:

N = 3
df = (pd.DataFrame({'a': np.random.permutation(df['list'])})
        .assign(b = lambda x: x.index // N, c = lambda x: x.index % N)
        .pivot('c','b','a')
        .add_prefix('list')
        .rename_axis(index=None, columns=None))


print (df)
  list0 list1
0     B     D
1     A     C
2     E   NaN

Upvotes: 2

Related Questions