Reputation: 5126
I just tried answering another question here on SO and ran into an issue of filtering a dataframe using pd.drop
. Here is the example I gave:
import pandas as pd
import langdetect
df = pd.DataFrame({'Sentence':['es muy bueno','run, Forest! Run!','Ήξερα ότι θα εξετάζατε τον Μεταφραστή Google', 'This is Certainly en']})
df['Language'] = df['Sentence'].apply(lambda x: langdetect.detect(x))
# output
Sentence Language
0 es muy bueno es
1 run, Forest! Run! ro
2 Ήξερα ότι θα εξετάζατε τον Μεταφραστή Google el
3 This is Certainly en en
Now I wanted to drop all rows where language is not en
. When using df.drop(df['Language'] != 'en')
it unexpectedly returns:
Sentence Language
2 Ήξερα ότι θα εξετάζατε τον Μεταφραστή Google el
3 This is Certainly en en
However, when I take the Boolean indexing it returns:
df['Language'] != 'en'
# output
0 True
1 True
2 True
3 False
Name: Language, dtype: bool
Now, I can get around this by using df.loc[df['Language'] == 'en']
. But I am wondering why drop
is behaving this way or if I've done something wrong?
Upvotes: 3
Views: 104
Reputation: 38415
Pandas drop takes index or column label
labels : single label or list-like
Index or column labels to drop.
When you pass the following to df.drop on default axis (which is 0), its dropping rows 0 and 1 - corresponding to False(0) and True(1)
df['Language'] != 'en'
0 True
1 True
2 True
3 False
Though it can be done using df.drop as in @Wen's answer, the most idiomatic way would be to go for boolean indexing or df.query
Upvotes: 2