pandas multiindex selecting...how to get the right (restricted to selection) index

Question

I am struggeling to get the right (restricted to the selection) index when using the methode xs by pandas to select specific data in my dataframe. Let me demonstrate, what I am doing:

print(df)
                                                             value
idx1              idx2          idx3         idx4  idx5            
10                2.0           0.0010          1     2        6.0  ...   
                                                2     3        6.0  ...   
...
                                                7     8        6.0  ...   
                                                8     9        6.0  ...  
20                2.0           0.0010          1     2        6.0  ...  
                                                2     3        6.0  ...  
...
                                                18    19       6.0  ...  
                                                19    20       6.0  ...  

# get dataframe for idx1 = 10, idx2 = 2.0, idx3 = 0.0010 
print(df.xs([10,2.0,0.0010]))

             value
idx4  idx5            
1     2        6.0  ...   
2     3        6.0  ...   
3     4        6.0  ...     
4     5        6.0  ...     
5     6        6.0  ...     
6     7        6.0  ...     
7     8        6.0  ...   
8     9        6.0  ...  

# get the first index list of this part of the dataframe
print(df.xs([10,2.0,0.0010]).index.levels[0])

[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19]

So I do not understand, why the full list of values that occur in idx4 is returned even though we restricted the dataframe to a part where idx4 only takes values from 1 to 8. Is it that I use the index method in a wrong way?

piRSquared · Accepted Answer

This is a known feature not bug. pandas preserves all of the index information. You can determine which of the levels are expressed and at what location via the labels attribute.

If you are looking to create an index that is fresh and just contains the information relevant to the slice you just made, you can do this:

df_new = df.xs([10,2.0,0.0010])
idx_new = pd.MultiIndex.from_tuples(df_new.index.to_series(),
                                    names=df_new.index.names)
df_new.index = idx_new

pandas multiindex selecting...how to get the right (restricted to selection) index

Answers (1)

Related Questions