Bob Fang
Bob Fang

Reputation: 7411

How to set and group pandas multi-level columns?

I have a dataframe that has the shape like this:

   PX_LAST PX_OPEN PX_CLOSE ticker source timestamp
0        1       2        3      A   LSE   20180101
1        4       5        6      A   LSE   20180102
1        7       8        9      B   LSE   20180101
1       10      11       12      B   LSE   20180102
....

I want to massage it to the following format:

                                     A                          B
                                   LSE                        LSE
            PX_LAST, PX_CLOSE, PX_OPEN PX_LAST, PX_CLOSE, PX_OPEN
timestamp 
20180101          1         2       3        7         8        9 
20180102          4         5       6       10        11       12
....

I tried to first use set_index to set the ticker and source columns to the row index and use unstack to push them on to the column axis which does seem to work

df.set_index(['timestamp', 'ticker', 'source'])
    .unstack(level=[1,2])
    .swaplevel(0,1,axis=1)
    .swaplevel(1,2,axis=1)

This does the trick, but has two problems: 1) it is very verbose, with all the swaplevel call we need to do to get the columns into the right shape. 2) It seems it does not do the grouping I wish it does, i.e the result I get is like this:

              LSE     LSE      LSE      LSE ...
          PX_LAST PX_LAST PX_CLOSE PX_CLOSE ...
timestamp 
20180101       1        7        2       8  ...
20180102       4        8        5      11  ...

Is there a cleaner way to do this so I can get the format I want?

Upvotes: 2

Views: 1121

Answers (3)

Valdi_Bo
Valdi_Bo

Reputation: 31011

My proposition is to change your solution the following way:

Step 1: df.set_index(['timestamp', 'ticker', 'source']).unstack([1, 2]), just as you did.

It leaves columns in the following shape:

          PX_LAST     PX_OPEN     PX_CLOSE
ticker          A   B       A   B        A   B
source        LSE LSE     LSE LSE      LSE LSE

(and timestamp as the index).

Step 2: reorder_levels([1, 2, 0], axis=1), instead of your 2 swaplevel instructions.

It leaves columns as:

ticker          A       B       A       B        A        B
source        LSE     LSE     LSE     LSE      LSE      LSE
          PX_LAST PX_LAST PX_OPEN PX_OPEN PX_CLOSE PX_CLOSE

And the last step is sort_index(axis=1, level=[0,1], sort_remaining=False)

Note that you sort only level 0 and 1, so the order in the last level remains unchanged (PX_LAST, PX_OPEN, PX_CLOSE).

So the whole script (i.e. a single instruction) is:

df2 = df.set_index(['timestamp', 'ticker', 'source']).unstack([1, 2])\
    .reorder_levels([1, 2, 0], axis=1)\
    .sort_index(axis=1, level=[0,1], sort_remaining=False)

When you print the result, you will get:

ticker          A                        B
source        LSE                      LSE
          PX_LAST PX_OPEN PX_CLOSE PX_LAST PX_OPEN PX_CLOSE
timestamp
20180101        1       2        3       7       8        9
20180102        4       5        6      10      11       12

Upvotes: 0

jezrael
jezrael

Reputation: 863291

Your solution should be a bit changed - with order of columns in set_index, omit second swaplevel and added sort_index:

df = (df.set_index(['timestamp', 'source', 'ticker'])
        .unstack(level=[1,2])
        .swaplevel(0,2,axis=1)
        .sort_index(axis=1)
)
print (df)
ticker           A                        B                
source         LSE                      LSE                
          PX_CLOSE PX_LAST PX_OPEN PX_CLOSE PX_LAST PX_OPEN
timestamp                                                  
20180101         3       1       2        9       7       8
20180102         6       4       5       12      10      11

Upvotes: 1

cs95
cs95

Reputation: 402862

One option is melt, set_index and unstack:

u = df.melt(['ticker', 'source', 'timestamp'])
(u.set_index(u.columns.difference({'value'}).tolist())['value']
  .unstack([1, 0, -1])
  .sort_index(axis=1))

ticker           A                        B                
source         LSE                      LSE                
variable  PX_CLOSE PX_LAST PX_OPEN PX_CLOSE PX_LAST PX_OPEN
timestamp                                                  
20180101         3       1       2        9       7       8
20180102         6       4       5       12      10      11

Or melt, and pivot_table:

u = df.melt(['ticker', 'source', 'timestamp'])
u.pivot_table(index='timestamp', 
              columns=['ticker','source','variable'], 
              values='value')

ticker           A                        B                
source         LSE                      LSE                
variable  PX_CLOSE PX_LAST PX_OPEN PX_CLOSE PX_LAST PX_OPEN
timestamp                                                  
20180101         3       1       2        9       7       8
20180102         6       4       5       12      10      11

Upvotes: 3

Related Questions