Zizi96
Zizi96

Reputation: 519

Drop specified rows from data frame

How do I drop a limited number of rows? So far my code drops every instance I give. So in the example below, every instance of 'dog' is dropped. However, I would like to drop a specified number of instances, so for example only drop 2 instances of dog, it would also be a benefit if the instances to drop were sampled at random.

num = [10, 20, 30, 10, 40, 50, 20, 60, 70, 20] 
color = ['red', 'white', 'black', 'green', 'white', 'orange', 'white', 'black', 'blue', 'red'] 
animal = ['dog', 'cat', 'raccoon', 'gecko', 'bear', 'raccoon', 'dog', 'goat', 'goat', 'dog'] 


dict = {'Number': num, 'Color': color, 'Animal': animal}  
df = pd.DataFrame(dict) 

to_drop = ['dog']
trimmed_df = df[~df['Animal'].isin(to_drop)]

Upvotes: 2

Views: 82

Answers (2)

ALollz
ALollz

Reputation: 59519

If multiple animials and different amounts you can groupby + sample. Store the animals and amounts in a dict, then resample the correct number.

This will drop at random and if you specify an N greater than the number of rows, it drops all of them for that animal

to_drop = {'dog': 2, 'raccoon': 1}

l = []
for animal, gp in df.groupby('Animal'):
    l.append(gp.sample(n=max(0, len(gp)-to_drop.get(animal, 0)), replace=False))

pd.concat(l).sort_index()

   Number   Color   Animal
1      20   white      cat
3      10   green    gecko
4      40   white     bear
5      50  orange  raccoon
7      60   black     goat
8      70    blue     goat
9      20     red      dog

The above isn't very efficient, so leveraging @QuangHoang's clever idea to cumcount we first shuffle the entire DataFrame (.sample(frac=1)) that way we randomly drop rows and then compare the cumcount with the cut-offs.

to_drop = {'dog': 2, 'raccoon': 1}

m = (df.sample(frac=1).groupby('Animal').cumcount()
       .lt(df['Animal'].map(to_drop)))
df = df[~m]

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150725

You can try:

to_drop = ['dog']
s = df['Animal'].isin(to_drop)

mask = s & s.cumsum().le(2)

df[~mask]

Output:

   Number   Color   Animal
1      20   white      cat
2      30   black  raccoon
3      10   green    gecko
4      40   white     bear
5      50  orange  raccoon
7      60   black     goat
8      70    blue     goat
9      20     red      dog

Update: In the case to_drop has multiple labels and you want to drop 2 instance in each of to_drop, you can do a groupby().cumcount():

mask = (df['Animal'].isin(to_drop) &
        df.groupby('Animal').cumcount().lt(2)
       )
print(df[~mask])

Upvotes: 2

Related Questions