Multiindex on DataFrames and sum in Pandas

Question

I am currently trying to make use of Pandas MultiIndex attribute. I am trying to group an existing DataFrame-object df_original based on its columns in a smart way, and was therefore thinking of MultiIndex.

print df_original =

       by_currency   by_portfolio    A  B  C
1        AUD          a              1  2  3
2        AUD          b              4  5  6
3        AUD          c              7  8  9
4        AUD          d              10 11 12
5        CHF          a              13 14 15 
6        CHF          b              16 17 18
7        CHF          c              19 20 21
8        CHF          d              22 23 24

Now, what I would like to have is a MultiIndex DataFrame-object, with A, B and C, and by_portfolio as indices. Looking like

              CHF     AUD
A       a     13      1
        b     16      4   
        c     19      7
        d     22      10
B       a     14      2
        b     17      5
        c     20      8
        d     23      11
C       a     15      3
        b     18      6
        c     21      9
        d     24      12

I have tried making all columns in df_original and the sought after indices into list-objects, and from there create a new DataFrame. This seems a bit cumbersome, and I can't figure out how to add the actual values after.

Perhaps some sort of groupby is better for this purpose? Thing is I will need to be able to add this data to another, similar, DataFrame, so I will need the resulting DataFrame to be able to be added to another one later on.

Thanks

joris · Accepted Answer

You can use a combination of stack and unstack:

In [50]: df.set_index(['by_currency', 'by_portfolio']).stack().unstack(0)
Out[50]:
by_currency     AUD  CHF
by_portfolio
a            A    1   13
             B    2   14
             C    3   15
b            A    4   16
             B    5   17
             C    6   18
c            A    7   19
             B    8   20
             C    9   21
d            A   10   22
             B   11   23
             C   12   24

To obtain your desired result, we only need to swap the levels of the index:

In [51]: df2 = df.set_index(['by_currency', 'by_portfolio']).stack().unstack(0)

In [52]: df2.columns.name = None

In [53]: df2.index = df2.index.swaplevel(0,1)

In [55]: df2 = df2.sort_index()

In [56]: df2
Out[56]:
                AUD  CHF
  by_portfolio
A a               1   13
  b               4   16
  c               7   19
  d              10   22
B a               2   14
  b               5   17
  c               8   20
  d              11   23
C a               3   15
  b               6   18
  c               9   21
  d              12   24

Multiindex on DataFrames and sum in Pandas

Answers (1)

Related Questions