kekert
kekert

Reputation: 966

pandas.DataFrame.query keeping original multiindex

I have a dataframe with multiindex:

>>> df = pd.DataFrame(np.random.randint(0,5,(6, 2)), columns=['col1','col2'])
>>> df['ind1'] = list('AAABCC')
>>> df['ind2'] = range(6)
>>> df.set_index(['ind1','ind2'], inplace=True)

>>> df
           col1  col2
ind1 ind2            
A    0        2     0
     1        2     2
     2        1     2
B    3        2     2
C    4        4     0
     5        1     4

when I select data using .loc[] on one of the index levels, and apply .query() afterwards, resulting index is "shrinked" as expected to match only those values contained in resulting dataframe:

>>> df.loc['A'].query('col2 == 2')

      col1  col2
ind2            
1        2     2
2        1     2

>>> df.loc['A'].query('col2 == 2').index

Int64Index([1, 2], dtype='int64', name='ind2')

however when I try to recieve same result using just .query(), pandas keeps the same index as on original dataframe (despite the fact, that it didn't behave like that above, in the case of single index - resulting index went from [0,1,2] to [1,2], matching only col2 == 2 rows):

>>> df.query('ind1 == "A" & col2 == 2')

           col1  col2
ind1 ind2            
A    1        2     2
     2        1     2

>>> df.query('ind1 == "A" & col2 == 2').index

MultiIndex(levels=[['A', 'B', 'C'], [0, 1, 2, 3, 4, 5]],
           labels=[[0, 0], [1, 2]],
           names=['ind1', 'ind2'])

is it a bug or a feature? if feature, could you please explain such behavior?

EDIT1: I would expect following index instead:

MultiIndex(levels=[['A'], [1, 2]],
           labels=[[0, 0], [0, 1]],
           names=['ind1', 'ind2'])

EDIT2: as explained in Dataframe Slice does not remove Index Values index values shouldn't be removed at all when slicing DF; such behavior should give following result:

>>> df.loc['A'].query('col2 == 2')

      col1  col2
ind2            
1        2     2
2        1     2

>>> df.loc['A'].query('col2 == 2').index

EXPECTATION: Int64Index([0, 1, 2], dtype='int64', name='ind2')
REALITY:     Int64Index([1, 2], dtype='int64', name='ind2')

Upvotes: 4

Views: 1146

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210982

df.loc[A] returns you a DF (or a "view") with a regular ("single") index:

In [12]: df.loc['A']
Out[12]:
      col1  col2
ind2
0        1     1
1        0     3
2        1     2

so .query() will be applied on that DF with a regular index...

Upvotes: 1

Related Questions