Reputation: 641
Is it possible to remove duplicates but keep last 3-4 ? Something like:
df = df.drop_duplicates(['ID'], keep='last_four')
Thank you
Upvotes: 1
Views: 87
Reputation: 394071
You can use groupby
and tail
and pass the num of rows you wish to keep to achieve the same result:
In [5]:
# data setup
df = pd.DataFrame({'ID':[0,0,0,0,0,0,1,1,1,1,1,1,1,2,2,3,3,3,3,3,3,3,3,3,4], 'val':np.arange(25)})
df
Out[5]:
ID val
0 0 0
1 0 1
2 0 2
3 0 3
4 0 4
5 0 5
6 1 6
7 1 7
8 1 8
9 1 9
10 1 10
11 1 11
12 1 12
13 2 13
14 2 14
15 3 15
16 3 16
17 3 17
18 3 18
19 3 19
20 3 20
21 3 21
22 3 22
23 3 23
24 4 24
Now groupby
and call tail
:
In [11]:
df.groupby('ID',as_index=False).tail(4)
Out[11]:
ID val
2 0 2
3 0 3
4 0 4
5 0 5
9 1 9
10 1 10
11 1 11
12 1 12
13 2 13
14 2 14
20 3 20
21 3 21
22 3 22
23 3 23
24 4 24
Upvotes: 2