Reputation: 667
I am struggeling to get the right (restricted to the selection) index when using the methode xs by pandas to select specific data in my dataframe. Let me demonstrate, what I am doing:
print(df)
value
idx1 idx2 idx3 idx4 idx5
10 2.0 0.0010 1 2 6.0 ...
2 3 6.0 ...
...
7 8 6.0 ...
8 9 6.0 ...
20 2.0 0.0010 1 2 6.0 ...
2 3 6.0 ...
...
18 19 6.0 ...
19 20 6.0 ...
# get dataframe for idx1 = 10, idx2 = 2.0, idx3 = 0.0010
print(df.xs([10,2.0,0.0010]))
value
idx4 idx5
1 2 6.0 ...
2 3 6.0 ...
3 4 6.0 ...
4 5 6.0 ...
5 6 6.0 ...
6 7 6.0 ...
7 8 6.0 ...
8 9 6.0 ...
# get the first index list of this part of the dataframe
print(df.xs([10,2.0,0.0010]).index.levels[0])
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19]
So I do not understand, why the full list of values that occur in idx4 is returned even though we restricted the dataframe to a part where idx4 only takes values from 1 to 8. Is it that I use the index method in a wrong way?
Upvotes: 2
Views: 93
Reputation: 294218
This is a known feature not bug. pandas preserves all of the index information. You can determine which of the levels are expressed and at what location via the labels
attribute.
If you are looking to create an index that is fresh and just contains the information relevant to the slice you just made, you can do this:
df_new = df.xs([10,2.0,0.0010])
idx_new = pd.MultiIndex.from_tuples(df_new.index.to_series(),
names=df_new.index.names)
df_new.index = idx_new
Upvotes: 1