Nils
Nils

Reputation: 409

Pandas MultiIndex, selecting values by 1. and 2. level

I´ve some problems by selecting values within 1. and 2. level.

I´ve got an MultiIndex by setting the header = [0,1]

In[1]:  df = pd.read_csv('Data.txt', sep='\t', header=[0,1], skipinitialspace=True)

In[2]:  print(df.columns)

Out[2]: MultiIndex(
        levels=[['20052065', '20052066', '20052082', '20052087', '20052089'], 
                ['CTF1', 'CTF2', 'CTF3', 'CTF_M', 'CTM1', 'CTM2', 'CTM3', 'CTM_M']],
        labels=[[...]],
        names=[...])

If it try to get the data of the 2. level value and selected element from 1. level i get the following output:

In[3]:  print(df['20052065'][['CTF1','CTF_M']])

Out[3]: TIME[s]     CTF1    CTF_M
        0.000    -14.386   14.963
        60.000   -26.937   34.729
        120.000  -29.986   58.265
            ...      ...      ...

Now i tried to generate the output for 2 Elements and did something like this:

In[4]:  print(df[['20052065','20052066']][['CTF1','CTF_M']])

Out[4]: KeyError: "['CTF1' 'CTF_M'] not in index"

Somehow this doesn´t work. Maybe you know what terrible went wrong?

Thanks for help.

Edit: In[1]: print(df) looks like:

Out[1]:          ELEMENT 20052065 20052066 20052082 20052087 20052089 20052090  \
       TIME[s]   TEMP[C]     CTF1     CTF1     CTF1     CTF1     CTF1     CTF1   
       0.000      24.000   -4.234   -6.728  -14.386   -4.356   -6.926  -10.205   
       60.000     36.137  -29.308  -24.795  -26.937  -30.134  -24.735  -23.474 
          ...        ...      ...      ...      ...      ...      ...      ...

The *.txt file looks like:

enter image description here

Upvotes: 2

Views: 1211

Answers (2)

unutbu
unutbu

Reputation: 879611

You could use df.loc:

import numpy as np
import pandas as pd

columns = pd.MultiIndex.from_product([['A','B','C'],['X','Y','Z']])
df = pd.DataFrame(np.random.randint(10, size=(3,len(columns))), columns=columns)
#    A        B        C      
#    X  Y  Z  X  Y  Z  X  Y  Z
# 0  2  7  5  1  6  0  5  0  0
# 1  8  4  7  2  0  8  7  3  9
# 2  0  6  8  8  1  1  8  0  2

# In some cases `sort_index` may be needed to avoid UnsortedIndexError
df = df.sort_index(axis=1)
print(df.loc[:, (['A','B'],['X','Y'])])

yields (something like):

   A     B   
   X  Y  X  Y
0  2  7  1  6
1  8  4  2  0
2  0  6  8  1

If you only want to select, say, the ('A','Y') and ('B','X') columns, then note that you can specify MultiIndexed columns as tuples:

In [37]: df.loc[:, [('A','Y'),('B','X')]]
Out[37]: 
   A  B
   Y  X
0  7  1
1  4  2
2  6  8

or even just df[[('A','Y'),('B','X')]] (which yields the same result).

And in general it is better to use a single indexer such as df.loc[...] instead of double indexing (e.g. df[...][...]). It can be quicker (because it makes fewer calls to __getitem__, and generates fewer temporary sub-DataFrames) and df.loc[...] = value it is the correct way to make assignments to sub-slices of a DataFrame which modify df itself.


The reason why df[['A','B']][['X','Y']] would not work is because df[['A','B']] returns a DataFrame with a MultiIndex:

In [36]: df[['A','B']]
Out[36]: 
   A        B      
   X  Y  Z  X  Y  Z
0  2  7  5  1  6  0
1  8  4  7  2  0  8
2  0  6  8  8  1  1

So indexing this DataFrame with ['X','Y'] fails because there are no top-level column labels named 'X' or 'Y'.


Sometimes, depending on how the DataFrame was constructed (or due to operations performed on the DataFrame) the MultiIndex needs to be lexsorted before it can be sliced. There is a boxed warning in the docs mentioning this issue. To lexsort the column index use

df = df.sort_index(axis=1)

Upvotes: 2

jezrael
jezrael

Reputation: 862681

I think need slicers:

print (df)
  20052065 20052066 20052065 20052066 20052065 20052066
      CTF1    CTF_M   CTF_M1   CTF_Mr        V        A
0        1        2        4        5        6        7

df = df.sort_index(axis=1)
idx = pd.IndexSlice
print (df.loc[:, idx[['20052065','20052066'], ['CTF1','CTF_M']]])
  20052065 20052066
      CTF1    CTF_M
0        1        2

Upvotes: 2

Related Questions