I have the following pandas dataframe. ID Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 1 A B C A B C A B C 2 D E F D E F D E F I would like to get the following table by restructuring every three-column values. ID Col_1 Col_2 Col_3 1 A B C A B C A B C 2 D E F D E F D E F OR ID Col_1 Col_2 Col_3 1 A B C 1 A B C 1 A B C 2 D E F 2 D E F 2 D E F Is there any efficient way to do it? I tried to find similar examples here at the StackOverflow community but couldn't. If you have, you can point me. Any help is appreciated!

Reputation:

Restructuring pandas dataframe based on number of columns

I have the following pandas dataframe.

ID   Col1   Col2   Col3   Col4   Col5   Col6   Col7   Col8   Col9
1     A      B       C      A     B      C      A      B      C
2     D      E       F      D     E      F      D      E      F

I would like to get the following table by restructuring every three-column values.

ID   Col_1   Col_2   Col_3
1      A       B       C
       A       B       C
       A       B       C

2      D       E       F
       D       E       F
       D       E       F

ID   Col_1   Col_2   Col_3
1      A       B       C
1      A       B       C
1      A       B       C
2      D       E       F
2      D       E       F
2      D       E       F

Is there any efficient way to do it? I tried to find similar examples here at the StackOverflow community but couldn't. If you have, you can point me.

Any help is appreciated!

Upvotes: 2

Answers (3)

anky

Reputation: 75100

Here is one with apply and np.reshape:

f = lambda x: pd.DataFrame(np.reshape(x.to_numpy(),(-1,3))).add_prefix('Col_')
df.groupby('ID').apply(f).reset_index('ID')

   ID Col_0 Col_1 Col_2
0   1     A     B     C
1   1     A     B     C
2   1     A     B     C
3   2     D     E     F
4   2     D     E     F
5   2     D     E     F

Upvotes: 1

carnava1

Reputation: 319

I'm not sure about efficient, but it works:

k = 3  # every k column
pd.DataFrame([df.values[:,[n,n+k,n+k]].flatten() for n in range(k)]).T

   0  1  2
0  A  B  C
1  A  B  C
2  A  B  C
3  D  E  F
4  D  E  F
5  D  E  F

Upvotes: 1

jezrael

Reputation: 863166

You ca use DataFrame.stack with MultiIndex by all column without ID created by integer and modulo division:

df = df.set_index('ID')
c = np.arange(len(df.columns))
df.columns = [c // 3, c % 3]
df1 = (df.stack()
         .reset_index(level=1, drop=True)
         .rename(columns= lambda x: f'Col_{x+1}')
         .reset_index())
print (df1)
   ID Col_1 Col_2 Col_3
0   1     A     A     A
1   1     B     B     B
2   1     C     C     C
3   2     D     D     D
4   2     E     E     E
5   2     F     F     F

Upvotes: 5

Restructuring pandas dataframe based on number of columns

Answers (3)

Related Questions