Reputation: 33
I have a MultiIndex dataframe looks like, this is only partial. The Year range frome 2007 to 2015 with the same places for each year.
Jan Feb Mar Apr May June July Aug Sept Oct \
Year Place
2007 Johore 1.26 1.07 1.21 1.27 1.33 1.28 1.67 1.88 1.89 1.86
Kedah 1.20 1.27 1.50 1.38 1.38 1.52 1.84 2.09 2.08 2.02
Kelantan 0.92 0.90 1.01 1.10 1.07 0.87 0.93 1.02 1.08 1.17
Malacca 1.62 1.45 1.64 1.52 1.50 1.40 1.75 1.80 2.03 2.14
N. Sembilan 0.98 0.94 1.11 1.07 1.10 1.16 1.46 1.58 1.61 1.71
Nov Dec
Year Place
2007 Johore 1.95 1.72
Kedah 1.79 1.39
Kelantan 1.29 0.97
Malacca 2.44 2.13
N. Sembilan 1.75 1.58
I want to rotate the data and get a single index dataframe with the index being months (e.g. 2007-Jan,2007-Feb ) and the columns being different places.
I tried 'Pahang' as an example and did:
In [14]:
Pahang=df.xs('Pahang',level='Place')
In [15]:
Pahang.unstack().unstack().unstack()
Out[15]:
Year
2007 Jan 1.19
Feb 1.01
Mar 1.13
Apr 1.19
May 1.24
June 1.17
July 1.43
Aug 1.59
Sept 1.63
Oct 1.64
Nov 1.82
Dec 1.31
2008 Jan 1.57
Feb 1.36
Mar 1.56
...
2014 Oct 1.87
Nov 1.74
Dec 1.09
2015 Jan 0.93
Feb 1.02
Mar 1.28
Apr 1.51
May NaN
June NaN
July NaN
Aug NaN
Sept NaN
Oct NaN
Nov NaN
Dec NaN
Length: 108, dtype: float64
I get the Pahang column as I want. Instead of doing it one place at a time, I am wondering if there is a way to loop through all places in a faster way. Thanks!
Upvotes: 3
Views: 1933
Reputation: 30444
I was thinking roughly along the same lines as @HappyLeapSecond, but will add this because it's not quite the same and is a little more general (for all rows rather than just a particular one).
First off, I'm going to use a slightly different sample dataset. Note also that I'm posting without the multi-index because a single level index is easier to copy and paste into pandas.
year place Jan Feb Mar Apr May June July Aug
0 2007 Johore 1.26 1.07 1.21 1.27 1.33 1.28 1.67 1.88
1 2007 Kedah 1.20 1.27 1.50 1.38 1.38 1.52 1.84 2.09
2 2007 Kelantan 0.92 0.90 1.01 1.10 1.07 0.87 0.93 1.02
3 2007 Malacca 1.62 1.45 1.64 1.52 1.50 1.40 1.75 1.80
4 2008 Johore 1.26 1.07 1.21 1.27 1.33 1.28 1.67 1.88
5 2008 Kedah 1.20 1.27 1.50 1.38 1.38 1.52 1.84 2.09
6 2008 Kelantan 0.92 0.90 1.01 1.10 1.07 0.87 0.93 1.02
7 2008 Malacca 1.62 1.45 1.64 1.52 1.50 1.40 1.75 1.80
Then, set the index so it's comparable to the one in the question:
df = df.reset_index(drop=True).set_index(['year','place'])
Jan Feb Mar Apr May June July Aug
year place
2007 Johore 1.26 1.07 1.21 1.27 1.33 1.28 1.67 1.88
Kedah 1.20 1.27 1.50 1.38 1.38 1.52 1.84 2.09
Kelantan 0.92 0.90 1.01 1.10 1.07 0.87 0.93 1.02
Malacca 1.62 1.45 1.64 1.52 1.50 1.40 1.75 1.80
2008 Johore 1.26 1.07 1.21 1.27 1.33 1.28 1.67 1.88
Kedah 1.20 1.27 1.50 1.38 1.38 1.52 1.84 2.09
Kelantan 0.92 0.90 1.01 1.10 1.07 0.87 0.93 1.02
Malacca 1.62 1.45 1.64 1.52 1.50 1.40 1.75 1.80
And then some switching around, resorting, etc. The main "problem" you have in your data is that you start with the year in the row axis but the month in the column axis. So what you need to do is move the year indexing from the row to the column. That's done with unstack(level='year')
. The rest of it is basically just a matter of cleaning up.
df.unstack(level='year').swaplevel(0,1,axis=1).T.sortlevel(0)
place Johore Kedah Kelantan Malacca
year
2007 Jan 1.26 1.20 0.92 1.62
Feb 1.07 1.27 0.90 1.45
Mar 1.21 1.50 1.01 1.64
Apr 1.27 1.38 1.10 1.52
May 1.33 1.38 1.07 1.50
June 1.28 1.52 0.87 1.40
July 1.67 1.84 0.93 1.75
Aug 1.88 2.09 1.02 1.80
2008 Jan 1.26 1.20 0.92 1.62
Feb 1.07 1.27 0.90 1.45
Mar 1.21 1.50 1.01 1.64
Apr 1.27 1.38 1.10 1.52
May 1.33 1.38 1.07 1.50
June 1.28 1.52 0.87 1.40
July 1.67 1.84 0.93 1.75
Aug 1.88 2.09 1.02 1.80
Edit to add: This last line can be simplified by using @JianxunLi's solution.
df.stack().unstack(level='place')
That's a better way of solving the problem of getting year/month on the same index and place on the opposite index, but I'll leave this answer here for now in the event that seeing an alternate method and explanation is helpful.
Upvotes: 2
Reputation: 24752
You can do the reshaping for all Places
and then select just one of them.
import pandas as pd
import numpy as np
# your data
# ===================================
multi_index = pd.MultiIndex.from_product([np.arange(2007,2016,1), 'A B C D E'.split()], names=['Year', 'Place'])
df = pd.DataFrame( np.random.randn(45,12), columns='Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec'.split(), index=multi_index)
df
Jan Feb Mar ... Oct Nov Dec
Year Place ...
2007 A -0.1512 0.7274 -0.3218 ... 1.2547 -1.8408 1.2585
B 0.0856 -1.0458 -1.1428 ... 1.0194 1.1958 0.4905
C -1.2021 -0.6989 -0.0486 ... -0.8053 -0.4929 1.6475
D -1.9948 -0.3465 1.3036 ... -0.2490 0.6285 -0.0568
E 0.0928 -1.3905 0.7203 ... -0.1138 2.9552 -0.0272
2008 A -1.2595 1.3072 0.6121 ... -1.4275 0.8769 2.0671
B 0.3611 -0.4187 -2.9609 ... -1.2944 1.2752 -0.0947
C 1.6492 0.0340 -0.9743 ... 0.0550 1.4135 0.8862
D 0.9034 -0.2957 0.2152 ... 1.0947 -0.2405 0.0367
E 0.9566 1.1927 0.0852 ... 0.7396 0.8240 -1.6628
... ... ... ... ... ... ... ...
2014 A 0.7478 -0.8905 0.6238 ... -1.0907 -0.2919 0.3261
B 3.6764 -0.0601 1.2751 ... 0.3294 -1.3375 -1.5087
C 2.3460 -0.4181 0.0607 ... -0.8270 0.0536 -0.4353
D 0.9733 -0.6863 0.5278 ... -1.8206 0.4788 1.1438
E -0.3514 2.4570 -0.8567 ... 1.3434 -1.5634 -0.9984
2015 A 1.2849 -1.0657 -0.1173 ... -0.1733 0.0441 0.0922
B 0.5802 -0.5912 1.1193 ... -0.1296 -0.6374 -1.7727
C -0.5026 -1.3111 -0.5499 ... 0.7308 1.2570 0.8733
D -1.6482 -0.2213 0.3336 ... -1.3141 -2.0377 -1.1468
E -2.0796 -0.2808 -1.4079 ... -0.3052 0.7999 0.3516
[45 rows x 12 columns]
# processing
# ==================================
res = df.stack().unstack(level='Place')
Place A B C D E
Year
2007 Jan -0.1512 0.0856 -1.2021 -1.9948 0.0928
Feb 0.7274 -1.0458 -0.6989 -0.3465 -1.3905
Mar -0.3218 -1.1428 -0.0486 1.3036 0.7203
Apr -1.4641 2.0384 0.6518 0.8756 -1.4627
May -0.8896 -1.6627 0.6990 0.2008 0.7423
June -0.5339 -0.6629 0.1121 0.3618 1.3838
July -0.4851 0.6544 0.5251 0.3394 -0.7016
Aug -1.2445 0.9671 -1.0684 -0.4776 -0.2936
Sept 1.1330 -0.7543 1.6029 0.5543 0.3234
Oct 1.2547 1.0194 -0.8053 -0.2490 -0.1138
... ... ... ... ... ...
2015 Mar -0.1173 1.1193 -0.5499 0.3336 -1.4079
Apr -1.0528 0.2421 0.3419 -2.1137 -0.2836
May -1.0709 -0.1794 -0.2682 -0.3226 0.8654
June -1.4538 -0.7313 0.3177 -1.4008 1.1357
July -1.6210 -0.3815 -0.9876 0.1019 1.7450
Aug 0.5692 0.7679 1.1893 -0.9612 0.0903
Sept 0.2371 0.6740 0.9204 -0.2909 -0.8197
Oct -0.1733 -0.1296 0.7308 -1.3141 -0.3052
Nov 0.0441 -0.6374 1.2570 -2.0377 0.7999
Dec 0.0922 -1.7727 0.8733 -1.1468 0.3516
[108 rows x 5 columns]
# select one place
res['A']
Year
2007 Jan -0.1512
Feb 0.7274
Mar -0.3218
Apr -1.4641
May -0.8896
June -0.5339
July -0.4851
Aug -1.2445
Sept 1.1330
Oct 1.2547
...
2015 Mar -0.1173
Apr -1.0528
May -1.0709
June -1.4538
July -1.6210
Aug 0.5692
Sept 0.2371
Oct -0.1733
Nov 0.0441
Dec 0.0922
Name: A, dtype: float64
Upvotes: 3
Reputation: 880359
Instead of using unstack
to swap index levels, you could use swaplevel
:
In [157]: df.xs('Johore',level='Place').unstack().swaplevel(0,1)
Out[157]:
Year
2007 Jan 1.26
Feb 1.07
Mar 1.21
Apr 1.27
May 1.33
June 1.28
July 1.67
Aug 1.88
Sept 1.89
Oct 1.86
dtype: float64
There is also the reorder_levels
method which can be useful if your MultiIndex has many levels and need to perform more than one swap.
Upvotes: 2
Reputation: 5797
You should use pandas.pivot
.
An example is here:
Pandas transposition inside dataframe
Upvotes: 0