pandas.DataFrame.query keeping original multiindex

Question

I have a dataframe with multiindex:

>>> df = pd.DataFrame(np.random.randint(0,5,(6, 2)), columns=['col1','col2'])
>>> df['ind1'] = list('AAABCC')
>>> df['ind2'] = range(6)
>>> df.set_index(['ind1','ind2'], inplace=True)

>>> df
           col1  col2
ind1 ind2            
A    0        2     0
     1        2     2
     2        1     2
B    3        2     2
C    4        4     0
     5        1     4

when I select data using .loc[] on one of the index levels, and apply .query() afterwards, resulting index is "shrinked" as expected to match only those values contained in resulting dataframe:

>>> df.loc['A'].query('col2 == 2')

      col1  col2
ind2            
1        2     2
2        1     2

>>> df.loc['A'].query('col2 == 2').index

Int64Index([1, 2], dtype='int64', name='ind2')

however when I try to recieve same result using just .query(), pandas keeps the same index as on original dataframe (despite the fact, that it didn't behave like that above, in the case of single index - resulting index went from [0,1,2] to [1,2], matching only col2 == 2 rows):

>>> df.query('ind1 == "A" & col2 == 2')

           col1  col2
ind1 ind2            
A    1        2     2
     2        1     2

>>> df.query('ind1 == "A" & col2 == 2').index

MultiIndex(levels=[['A', 'B', 'C'], [0, 1, 2, 3, 4, 5]],
           labels=[[0, 0], [1, 2]],
           names=['ind1', 'ind2'])

is it a bug or a feature? if feature, could you please explain such behavior?

EDIT1: I would expect following index instead:

MultiIndex(levels=[['A'], [1, 2]],
           labels=[[0, 0], [0, 1]],
           names=['ind1', 'ind2'])

EDIT2: as explained in Dataframe Slice does not remove Index Values index values shouldn't be removed at all when slicing DF; such behavior should give following result:

>>> df.loc['A'].query('col2 == 2')

      col1  col2
ind2            
1        2     2
2        1     2

>>> df.loc['A'].query('col2 == 2').index

EXPECTATION: Int64Index([0, 1, 2], dtype='int64', name='ind2')
REALITY:     Int64Index([1, 2], dtype='int64', name='ind2')

pandas.DataFrame.query keeping original multiindex

Answers (1)

Related Questions