Boosted_d16
Boosted_d16

Reputation: 14112

Delete multiple Pandas DataFrame row where column value is this or that

I have a dataframe which looks like this

                                    Label                   Type  
Name                                                              
ppppp                         Base brute          UnweightedBase  
pbaaa                               Base                    Base  
pb4a1                      Très à gauche                Category 
pb4a2                           A gauche   pb4a2        Category  
pb4a3                          Au centre   pb4a3        Category  
pb4a4                           A droite   pb4a4        Category  

if "Type" column's value is "UnweightedBase" and "Base", I would like that delete from the data.

I can do this but just for one item at a time with the following code:

to_del = df[df['Type'] == "UnweightedBase"].index.tolist()

df= df.drop(to_del, axis)
return df

How do I modify my code so that I can delete more than one value at once?

my failed attempt:

to_del = df[df['Type'] in ["UnweightedBase","Base"]].index.tolist()

df= df.drop(to_del, axis)
return df

Upvotes: 0

Views: 4705

Answers (1)

unutbu
unutbu

Reputation: 880717

You could select the desired rows and reassign the resultant DataFrame to df:

In [60]: df = df.loc[~df['Type'].isin(['UnweightedBase', 'Base'])]

In [61]: df
Out[61]: 
    Name              Label      Type
2  pb4a1      Très à gauche  Category
3  pb4a2   A gauche   pb4a2  Category
4  pb4a3  Au centre   pb4a3  Category
5  pb4a4   A droite   pb4a4  Category

I think this is more direct and safer than using

to_del = df[df['Type'].isin(type_val)].index.tolist()
df= df.drop(to_del, axis)

since the latter does essentially the same selection as an intermediate step:

df[df['Type'].isin(type_val)]

moreover, index.tolist() will return index labels. If the index has non-unique values, you might delete unintended rows.

For example:

In [85]: df = pd.read_table('data', sep='\s{4,}')

In [86]: df.index = ['a','b','c','d','e','a']

In [87]: df
Out[87]: 
    Name              Label            Type
a  ppppp         Base brute  UnweightedBase
b  pbaaa               Base            Base
c  pb4a1      Très à gauche        Category
d  pb4a2   A gauche   pb4a2        Category
e  pb4a3  Au centre   pb4a3        Category
a  pb4a4   A droite   pb4a4        Category  #<-- note the repeated index

In [88]: to_del = df[df['Type'].isin(['UnweightedBase', 'Base'])].index.tolist()

In [89]: to_del
Out[89]: ['a', 'b']

In [90]: df = df.drop(to_del)

In [91]: df
Out[91]: 
    Name              Label      Type
c  pb4a1      Très à gauche  Category
d  pb4a2   A gauche   pb4a2  Category
e  pb4a3  Au centre   pb4a3  Category
#<--- OOPs, we've lost the last row, even though the Type was Category.

Upvotes: 3

Related Questions