How to delete certain rows in pandas which satisfies some condition

Question

I am using pandas and I have dataset which are looking like this:

ID-cell    TOWNS      NumberOfCrimes
 1          Paris       444
 1          Berlin      333
 1          London      111        
 2          Paris       222
 2          London      555
 2          Berlin      3
 3          Paris       999
 4          Berlin      777
 4          Paris       5
 5          Paris       123
 5          Berlin      8
 6          Paris       1000
 9          Berlin      321
 12         Berlin      1
 12         Berlin      2
 12         Paris       1

        . . .

And its a really big dataset. I need to keep for each city just 5 rows with the highest number of crimes and rest of them to delete.

So my output should look like this:

ID-cell    TOWNS      NumberOfCrimes
 6          Paris       1000
 3          Paris       999     
 1          Paris       444
 2          Paris       222
 5          Paris       123

 4          Berlin      777
 1          Berlin      333
 9          Berlin      321
 5          Berlin      8

 1          London      555        
 2          London      111

I really appreciate the help. I am new in this. And I am working some project for Faculty and my deadline is so close. :/

jpp · Accepted Answer

sort + groupby.head

You can sort by NumberOfCrimes descending, then use groupby + head. Here's an example with your data extracting the single highest NumberOfCrimes by Town.

res = df.sort_values('NumberOfCrimes', ascending=False)\
        .groupby('TOWNS').head(1)

print(res)

   ID-cell   TOWNS  NumberOfCrimes
5        3   Paris             999
4        2  London             555
1        1  Berlin             333

So, for the top 2 or 3 for each town, you can use head(2), head(3), etc.

How to delete certain rows in pandas which satisfies some condition

Answers (2)

sort + groupby.head

Related Questions