Eliminating a row index and its rows completely from a dataframe

Question

I am writing a python script with the the following dataframe in pandas:

dog   dog     1   1   1   1   1   1   0   0   1   1
      fox     1   1   1   1   1   1   0   0   1   1
      the     1   1   1   1   1   1   1   0   1   1
      jumps   1   1   1   1   1   1   0   1   1   1
      over    1   1   1   1   1   1   0   0   1   1
fox   dog     1   1   1   1   1   1   0   0   1   1
      fox     1   1   1   1   1   1   0   0   1   1
      the     1   1   1   1   1   1   1   0   1   1
      jumps   1   1   1   1   1   1   0   1   1   1
      over    1   1   1   1   1   1   0   0   1   1
jumps dog     1   1   1   1   1   1   1   0   1   0
      fox     1   1   1   1   1   1   1   0   1   0
      the     1   0   1   1   1   1   0   0   1   0
      jumps   1   1   1   1   1   1   0   0   1   0
      over    1   0   1   1   1   0   0   1   1   0
over  dog     1   1   1   1   1   1   0   0   1   0
      fox     1   1   1   1   1   1   0   0   1   0
      the     1   0   1   1   1   0   0   1   1   0
      jumps   1   1   0   1   0   1   1   0   1   0
      over    1   1   1   1   1   1   0   0   1   0
the   dog     1   1   1   1   1   1   0   1   1   0
      fox     1   1   1   1   1   1   0   1   1   0
      the     1   1   1   1   1   1   0   0   1   0
      jumps   1   1   0   1   1   1   0   0   1   0
      over    1   1   0   1   0   1   1   0   1   0

Here I want to eliminate any row containing the word 'fox' in either first or second level row index so the new dataframe becomes:

dog   dog     1   1   1   1   1   1   0   0   1   1
      the     1   1   1   1   1   1   1   0   1   1
      jumps   1   1   1   1   1   1   0   1   1   1
      over    1   1   1   1   1   1   0   0   1   1
jumps dog     1   1   1   1   1   1   1   0   1   0
      the     1   0   1   1   1   1   0   0   1   0
      jumps   1   1   1   1   1   1   0   0   1   0
      over    1   0   1   1   1   0   0   1   1   0
over  dog     1   1   1   1   1   1   0   0   1   0
      the     1   0   1   1   1   0   0   1   1   0
      jumps   1   1   0   1   0   1   1   0   1   0
      over    1   1   1   1   1   1   0   0   1   0
the   dog     1   1   1   1   1   1   0   1   1   0
      the     1   1   1   1   1   1   0   0   1   0
      jumps   1   1   0   1   1   1   0   0   1   0
      over    1   1   0   1   0   1   1   0   1   0

It would be advantageous if I can eliminate multiple words like this in a single query. For example 'fox' and 'over'. I have tried using a combination of df.xs and df.drop but nothing seems to work properly. Any ideas?

jpp · Accepted Answer

This is a minimal example:

df = pd.DataFrame([['dog', 'dog', 1], ['dog', 'fox', 1], ['dog', 'the', 1],
                   ['fox', 'dog', 0], ['fox', 'fox', 0], ['fox', 'the', 0],
                   ['jumps', 'dog', 1], ['jumps', 'fox', 1], ['jumps', 'the', 1]],
                  columns=['A', 'B', 'C'])

df = df.set_index(['A', 'B'])

#            C
# A     B     
# dog   dog  1
#       fox  1
#       the  1
# fox   dog  0
#       fox  0
#       the  0
# jumps dog  1
#       fox  1
#       the  1

def remover(df, lst):
    return df.drop(lst, level=0).drop(lst, level=1)

df = df.pipe(remover, ['fox', 'dog'])

#            C
# A     B     
# jumps the  1

Eliminating a row index and its rows completely from a dataframe

Answers (2)

Related Questions