Reputation: 966
I have a dataframe with multiindex:
>>> df = pd.DataFrame(np.random.randint(0,5,(6, 2)), columns=['col1','col2'])
>>> df['ind1'] = list('AAABCC')
>>> df['ind2'] = range(6)
>>> df.set_index(['ind1','ind2'], inplace=True)
>>> df
col1 col2
ind1 ind2
A 0 2 0
1 2 2
2 1 2
B 3 2 2
C 4 4 0
5 1 4
when I select data using .loc[]
on one of the index levels, and apply .query()
afterwards, resulting index is "shrinked" as expected to match only those values contained in resulting dataframe:
>>> df.loc['A'].query('col2 == 2')
col1 col2
ind2
1 2 2
2 1 2
>>> df.loc['A'].query('col2 == 2').index
Int64Index([1, 2], dtype='int64', name='ind2')
however when I try to recieve same result using just .query()
, pandas keeps the same index as on original dataframe (despite the fact, that it didn't behave like that above, in the case of single index - resulting index went from [0,1,2]
to [1,2]
, matching only col2 == 2
rows):
>>> df.query('ind1 == "A" & col2 == 2')
col1 col2
ind1 ind2
A 1 2 2
2 1 2
>>> df.query('ind1 == "A" & col2 == 2').index
MultiIndex(levels=[['A', 'B', 'C'], [0, 1, 2, 3, 4, 5]],
labels=[[0, 0], [1, 2]],
names=['ind1', 'ind2'])
is it a bug or a feature? if feature, could you please explain such behavior?
EDIT1: I would expect following index instead:
MultiIndex(levels=[['A'], [1, 2]],
labels=[[0, 0], [0, 1]],
names=['ind1', 'ind2'])
EDIT2: as explained in Dataframe Slice does not remove Index Values index values shouldn't be removed at all when slicing DF; such behavior should give following result:
>>> df.loc['A'].query('col2 == 2')
col1 col2
ind2
1 2 2
2 1 2
>>> df.loc['A'].query('col2 == 2').index
EXPECTATION: Int64Index([0, 1, 2], dtype='int64', name='ind2')
REALITY: Int64Index([1, 2], dtype='int64', name='ind2')
Upvotes: 4
Views: 1146
Reputation: 210982
df.loc[A]
returns you a DF (or a "view") with a regular ("single") index:
In [12]: df.loc['A']
Out[12]:
col1 col2
ind2
0 1 1
1 0 3
2 1 2
so .query()
will be applied on that DF with a regular index...
Upvotes: 1