Reputation: 1865
I am writing a python script with the the following dataframe in pandas:
dog dog 1 1 1 1 1 1 0 0 1 1
fox 1 1 1 1 1 1 0 0 1 1
the 1 1 1 1 1 1 1 0 1 1
jumps 1 1 1 1 1 1 0 1 1 1
over 1 1 1 1 1 1 0 0 1 1
fox dog 1 1 1 1 1 1 0 0 1 1
fox 1 1 1 1 1 1 0 0 1 1
the 1 1 1 1 1 1 1 0 1 1
jumps 1 1 1 1 1 1 0 1 1 1
over 1 1 1 1 1 1 0 0 1 1
jumps dog 1 1 1 1 1 1 1 0 1 0
fox 1 1 1 1 1 1 1 0 1 0
the 1 0 1 1 1 1 0 0 1 0
jumps 1 1 1 1 1 1 0 0 1 0
over 1 0 1 1 1 0 0 1 1 0
over dog 1 1 1 1 1 1 0 0 1 0
fox 1 1 1 1 1 1 0 0 1 0
the 1 0 1 1 1 0 0 1 1 0
jumps 1 1 0 1 0 1 1 0 1 0
over 1 1 1 1 1 1 0 0 1 0
the dog 1 1 1 1 1 1 0 1 1 0
fox 1 1 1 1 1 1 0 1 1 0
the 1 1 1 1 1 1 0 0 1 0
jumps 1 1 0 1 1 1 0 0 1 0
over 1 1 0 1 0 1 1 0 1 0
Here I want to eliminate any row containing the word 'fox' in either first or second level row index so the new dataframe becomes:
dog dog 1 1 1 1 1 1 0 0 1 1
the 1 1 1 1 1 1 1 0 1 1
jumps 1 1 1 1 1 1 0 1 1 1
over 1 1 1 1 1 1 0 0 1 1
jumps dog 1 1 1 1 1 1 1 0 1 0
the 1 0 1 1 1 1 0 0 1 0
jumps 1 1 1 1 1 1 0 0 1 0
over 1 0 1 1 1 0 0 1 1 0
over dog 1 1 1 1 1 1 0 0 1 0
the 1 0 1 1 1 0 0 1 1 0
jumps 1 1 0 1 0 1 1 0 1 0
over 1 1 1 1 1 1 0 0 1 0
the dog 1 1 1 1 1 1 0 1 1 0
the 1 1 1 1 1 1 0 0 1 0
jumps 1 1 0 1 1 1 0 0 1 0
over 1 1 0 1 0 1 1 0 1 0
It would be advantageous if I can eliminate multiple words like this in a single query. For example 'fox' and 'over'. I have tried using a combination of df.xs and df.drop but nothing seems to work properly. Any ideas?
Upvotes: 1
Views: 59
Reputation: 21729
If you have column names (colname) defined, this might work:
df = df.loc[(df.index != 'fox') & (df.colname != 'fox')]
or, if it's a multi-indexed data frame, by resetting the index, you can do:
df = df.reset_index(drop=False)
df = df.loc[(df.index != 'fox') & (df.colname != 'fox')]
Upvotes: 0
Reputation: 164753
This is a minimal example:
df = pd.DataFrame([['dog', 'dog', 1], ['dog', 'fox', 1], ['dog', 'the', 1],
['fox', 'dog', 0], ['fox', 'fox', 0], ['fox', 'the', 0],
['jumps', 'dog', 1], ['jumps', 'fox', 1], ['jumps', 'the', 1]],
columns=['A', 'B', 'C'])
df = df.set_index(['A', 'B'])
# C
# A B
# dog dog 1
# fox 1
# the 1
# fox dog 0
# fox 0
# the 0
# jumps dog 1
# fox 1
# the 1
def remover(df, lst):
return df.drop(lst, level=0).drop(lst, level=1)
df = df.pipe(remover, ['fox', 'dog'])
# C
# A B
# jumps the 1
Upvotes: 1