Pandas drop duplicates on one column and keep only rows with the most frequent value in another column

Question

I have a dataframe that looks as the following:

ip_address    malware_type
ip_1          malware_1
ip_2          malware_2
ip_1          malware_1
ip_1          malware_1
ip_1          malware_2
ip_2          malware_2
ip_2          malware_3
.
.
.

I want to drop duplicate rows based on the 'ip_address' column, however when the dropping occurs, I want to keep only the 'malware_type' value that is the most frequent for each IP. So the resulting dataframe should look like:

ip_address    malware_type
ip_1          malware_1
ip_2          malware_2
.
.
.

I would really appreciate any help to achieve the above. Thanks.

BENY · Accepted Answer

Let us try mode

s=df.groupby('ip_address').malware_type.agg(lambda x : x.mode()[0]) # .reset_index()
Out[56]: 
ip_address
ip_1    malware_1
ip_2    malware_2
Name: malware_type, dtype: object

Pandas drop duplicates on one column and keep only rows with the most frequent value in another column

Answers (2)

Related Questions