Remove duplicates but keep some

Question

Is it possible to remove duplicates but keep last 3-4 ? Something like:

 df = df.drop_duplicates(['ID'], keep='last_four')

Thank you

EdChum · Accepted Answer

You can use groupby and tail and pass the num of rows you wish to keep to achieve the same result:

In [5]:   
# data setup 
df = pd.DataFrame({'ID':[0,0,0,0,0,0,1,1,1,1,1,1,1,2,2,3,3,3,3,3,3,3,3,3,4], 'val':np.arange(25)})
df
Out[5]:
    ID  val
0    0    0
1    0    1
2    0    2
3    0    3
4    0    4
5    0    5
6    1    6
7    1    7
8    1    8
9    1    9
10   1   10
11   1   11
12   1   12
13   2   13
14   2   14
15   3   15
16   3   16
17   3   17
18   3   18
19   3   19
20   3   20
21   3   21
22   3   22
23   3   23
24   4   24

Now groupby and call tail:

In [11]:    
df.groupby('ID',as_index=False).tail(4)

Out[11]:
    ID  val
2    0    2
3    0    3
4    0    4
5    0    5
9    1    9
10   1   10
11   1   11
12   1   12
13   2   13
14   2   14
20   3   20
21   3   21
22   3   22
23   3   23
24   4   24

Remove duplicates but keep some

Answers (1)

Related Questions