Reputation: 453
I am using pandas and I have dataset which are looking like this:
ID-cell TOWNS NumberOfCrimes
1 Paris 444
1 Berlin 333
1 London 111
2 Paris 222
2 London 555
2 Berlin 3
3 Paris 999
4 Berlin 777
4 Paris 5
5 Paris 123
5 Berlin 8
6 Paris 1000
9 Berlin 321
12 Berlin 1
12 Berlin 2
12 Paris 1
. . .
And its a really big dataset. I need to keep for each city just 5 rows with the highest number of crimes and rest of them to delete.
So my output should look like this:
ID-cell TOWNS NumberOfCrimes
6 Paris 1000
3 Paris 999
1 Paris 444
2 Paris 222
5 Paris 123
4 Berlin 777
1 Berlin 333
9 Berlin 321
5 Berlin 8
1 London 555
2 London 111
I really appreciate the help. I am new in this. And I am working some project for Faculty and my deadline is so close. :/
Upvotes: 3
Views: 60
Reputation: 164803
You can sort by NumberOfCrimes
descending, then use groupby
+ head
. Here's an example with your data extracting the single highest NumberOfCrimes by Town.
res = df.sort_values('NumberOfCrimes', ascending=False)\
.groupby('TOWNS').head(1)
print(res)
ID-cell TOWNS NumberOfCrimes
5 3 Paris 999
4 2 London 555
1 1 Berlin 333
So, for the top 2 or 3 for each town, you can use head(2)
, head(3)
, etc.
Upvotes: 3
Reputation: 323366
Using
df.sort_values('NumberOfCrimes').drop_duplicates('ID-cell',keep='last')
Out[404]:
ID-cell TOWNS NumberOfCrimes
0 1 Paris 444
4 2 London 555
5 3 Paris 999
Upvotes: 2