Reputation: 7681
I have some data in a pandas dataframe which looks like this:
gene VIM
time:2|treatment:TGFb|dose:0.1 -0.158406
time:2|treatment:TGFb|dose:1 0.039158
time:2|treatment:TGFb|dose:10 -0.052608
time:24|treatment:TGFb|dose:0.1 0.157153
time:24|treatment:TGFb|dose:1 0.206030
time:24|treatment:TGFb|dose:10 0.132580
time:48|treatment:TGFb|dose:0.1 -0.144209
time:48|treatment:TGFb|dose:1 -0.093910
time:48|treatment:TGFb|dose:10 -0.166819
time:6|treatment:TGFb|dose:0.1 0.097548
time:6|treatment:TGFb|dose:1 0.026664
time:6|treatment:TGFb|dose:10 -0.008032
where the left is an index. This is just a subsection of the data which is actually much larger. The index is composed of three components, time, treatment and dose. I want to reorganize this data such that I can access it easily by slicing. The way to do this is to use pandas MultiIndexing but I don't know how to convert my DataFrame with one index into another with three. Does anybody know how to do this?
To clarify, the desired output here is the same data with a three level index, the outer being treatment, middle is dose and the inner being time. This would be useful so then I could access the data with something like df['time']['dose']
or 'df[0]` (or something to that effect at least).
Upvotes: 1
Views: 442
Reputation: 863226
You can first replace
unnecessary strings (index has to be converted to Series
by to_series
, because replace
doesnt work with index
yet) and then use split
. Last set index names by rename_axis
(new in pandas
0.18.0
)
df.index = df.index.to_series().replace({'time:':'','treatment:': '','dose:':''}, regex=True)
df.index = df.index.str.split('|', expand=True)
df = df.rename_axis(('time','treatment','dose'))
print (df)
VIM
time treatment dose
2 TGFb 0.1 -0.158406
1 0.039158
10 -0.052608
24 TGFb 0.1 0.157153
1 0.206030
10 0.132580
48 TGFb 0.1 -0.144209
1 -0.093910
10 -0.166819
6 TGFb 0.1 0.097548
1 0.026664
10 -0.008032
Upvotes: 1