Astrid
Astrid

Reputation: 1936

Re-shaping pandas data frame using shape or pivot_table (stack each row)

I have an almost embarrassingly simple question, which I cannot figure out for myself.

Here's a toy example to demonstrate what I want to do, suppose I have this simple data frame:

df = pd.DataFrame([[1,2,3,4,5,6],[7,8,9,10,11,12]],index=range(2),columns=list('abcdef'))

   a  b  c   d   e   f
0  1  2  3   4   5   6
1  7  8  9  10  11  12

What I want is to stack it so that it takes the following form, where the columns identifiers have been changed (to X and Y) so that they are the same for all re-stacked values:

   X  Y 
0  1  2 
   3  4
   5  6
1  7  8  
   9  10  
   11 12

I am pretty sure you can do it with pd.stack() or pd.pivot_table() but I have read the documentation, but cannot figure out how to do it. But instead of appending all columns to the end of the next, I just want to append a pairs (or triplets of values actually) of values from each row.

Just to add some more flesh to the bones of what I want to do;

df = pd.DataFrame(np.random.randn(3,6),index=range(3),columns=list('abcdef'))

          a         b         c         d         e         f
0 -0.168636 -1.878447 -0.985152 -0.101049  1.244617  1.256772
1  0.395110 -0.237559  0.034890 -1.244669 -0.721756  0.473696
2 -0.973043  1.784627  0.601250 -1.718324  0.145479 -0.099530

I want this to re-stacked into this form (where column labels have been changed again, to the same for all values):

          X         Y         Z       
0 -0.168636 -1.878447 -0.985152 
  -0.101049  1.244617  1.256772
1  0.395110 -0.237559  0.034890 
  -1.244669 -0.721756  0.473696
2 -0.973043  1.784627  0.601250 
  -1.718324  0.145479 -0.099530

Yes, one could just make a for-loop with the following logic operating on each row:

df.values.reshape(df.shape[1]/3,2)

But then you would have to compute each row individually and my actual data has tens of thousands of rows.

So I want to stack each individual row selectively (e.g. by pairs of values or triplets), and then stack that row-stack, for the entire data frame, basically. Preferably done on the entire data frame at once (if possible).

Apologies for such a trivial question.

Upvotes: 2

Views: 2542

Answers (1)

unutbu
unutbu

Reputation: 879641

Use numpy.reshape to reshape the underlying data in the DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(3,6),index=range(3),columns=list('abcdef'))
print(df)
#           a         b         c         d         e         f
# 0 -0.889810  1.348811 -1.071198  0.091841 -0.781704 -1.672864
# 1  0.398858  0.004976  1.280942  1.185749  1.260551  0.858973
# 2  1.279742  0.946470 -1.122450 -0.355737  1.457966  0.034319
result = pd.DataFrame(df.values.reshape(-1,3),
                      index=df.index.repeat(2), columns=list('XYZ'))
print(result)

yields

          X         Y         Z
0 -0.889810  1.348811 -1.071198
0  0.091841 -0.781704 -1.672864
1  0.398858  0.004976  1.280942
1  1.185749  1.260551  0.858973
2  1.279742  0.946470 -1.122450
2 -0.355737  1.457966  0.034319

Upvotes: 3

Related Questions