Reputation: 409
I´ve some problems by selecting values within 1. and 2. level.
I´ve got an MultiIndex by setting the
header = [0,1]
In[1]: df = pd.read_csv('Data.txt', sep='\t', header=[0,1], skipinitialspace=True)
In[2]: print(df.columns)
Out[2]: MultiIndex(
levels=[['20052065', '20052066', '20052082', '20052087', '20052089'],
['CTF1', 'CTF2', 'CTF3', 'CTF_M', 'CTM1', 'CTM2', 'CTM3', 'CTM_M']],
labels=[[...]],
names=[...])
If it try to get the data of the 2. level value and selected element from 1. level i get the following output:
In[3]: print(df['20052065'][['CTF1','CTF_M']])
Out[3]: TIME[s] CTF1 CTF_M
0.000 -14.386 14.963
60.000 -26.937 34.729
120.000 -29.986 58.265
... ... ...
Now i tried to generate the output for 2 Elements and did something like this:
In[4]: print(df[['20052065','20052066']][['CTF1','CTF_M']])
Out[4]: KeyError: "['CTF1' 'CTF_M'] not in index"
Somehow this doesn´t work. Maybe you know what terrible went wrong?
Thanks for help.
Edit: In[1]: print(df)
looks like:
Out[1]: ELEMENT 20052065 20052066 20052082 20052087 20052089 20052090 \
TIME[s] TEMP[C] CTF1 CTF1 CTF1 CTF1 CTF1 CTF1
0.000 24.000 -4.234 -6.728 -14.386 -4.356 -6.926 -10.205
60.000 36.137 -29.308 -24.795 -26.937 -30.134 -24.735 -23.474
... ... ... ... ... ... ... ...
The *.txt file looks like:
Upvotes: 2
Views: 1211
Reputation: 879611
You could use df.loc
:
import numpy as np
import pandas as pd
columns = pd.MultiIndex.from_product([['A','B','C'],['X','Y','Z']])
df = pd.DataFrame(np.random.randint(10, size=(3,len(columns))), columns=columns)
# A B C
# X Y Z X Y Z X Y Z
# 0 2 7 5 1 6 0 5 0 0
# 1 8 4 7 2 0 8 7 3 9
# 2 0 6 8 8 1 1 8 0 2
# In some cases `sort_index` may be needed to avoid UnsortedIndexError
df = df.sort_index(axis=1)
print(df.loc[:, (['A','B'],['X','Y'])])
yields (something like):
A B
X Y X Y
0 2 7 1 6
1 8 4 2 0
2 0 6 8 1
If you only want to select, say, the ('A','Y')
and ('B','X')
columns, then note that you can specify MultiIndexed columns as tuples:
In [37]: df.loc[:, [('A','Y'),('B','X')]]
Out[37]:
A B
Y X
0 7 1
1 4 2
2 6 8
or even just df[[('A','Y'),('B','X')]]
(which yields the same result).
And in general it is better to use a single indexer such as df.loc[...]
instead of double indexing (e.g. df[...][...]
). It can be quicker (because it makes fewer calls to __getitem__
, and generates fewer temporary sub-DataFrames) and df.loc[...] = value
it is the correct way to make assignments to sub-slices of a DataFrame which modify df
itself.
The reason why df[['A','B']][['X','Y']]
would not work is because
df[['A','B']]
returns a DataFrame with a MultiIndex:
In [36]: df[['A','B']]
Out[36]:
A B
X Y Z X Y Z
0 2 7 5 1 6 0
1 8 4 7 2 0 8
2 0 6 8 8 1 1
So indexing this DataFrame with ['X','Y']
fails because there are no top-level column labels named 'X'
or 'Y'
.
Sometimes, depending on how the DataFrame was constructed (or due to operations performed on the DataFrame) the MultiIndex needs to be lexsorted before it can be sliced. There is a boxed warning in the docs mentioning this issue. To lexsort the column index use
df = df.sort_index(axis=1)
Upvotes: 2
Reputation: 862681
I think need slicers:
print (df)
20052065 20052066 20052065 20052066 20052065 20052066
CTF1 CTF_M CTF_M1 CTF_Mr V A
0 1 2 4 5 6 7
df = df.sort_index(axis=1)
idx = pd.IndexSlice
print (df.loc[:, idx[['20052065','20052066'], ['CTF1','CTF_M']]])
20052065 20052066
CTF1 CTF_M
0 1 2
Upvotes: 2