chuky pedro
chuky pedro

Reputation: 745

How to get the most recurring row from a pandas dataframe column

I would like to get the most recurring amount, along side the it description from the below dataframe. The length of the dataframe is longer that what I displayed here.

dataframe

          description                        amount type    cosine_group
1       295302|service fee 295302|microloa  1500.0   D         24
2       1292092|rpmt microloan|71302        20000.0  D         31
3       qr q10065116702 fund trf 0002080661 30000.0  D         12
4       qr q10060597280 fund trf 0002080661 30000.0  D         12
5       1246175|service fee 1246175|microlo 3000.0   D         24
6       qr q10034118487 fund trf 0002080661 2000.0   D         12

Here I tried using the grouby function

df.groupby(['cosine_group'])['amount'].value_counts()[:2]

the above code returns

cosine_group  amount 
12            30000.0    7
              30000.0    6
       

I need the description along side the most recurring amount

Expected output is :

     description                                amount
   qr q10065116702 fund trf 0002080661         30000.0  
   qr q10060597280 fund trf 0002080661         30000.0

Upvotes: 1

Views: 76

Answers (1)

Andreas
Andreas

Reputation: 9207

You can use mode:

  description  amount type
0           A           15
1           B         2000
2           C         3000
3           C         3000
4           C         3000
5           D           30
6           E           20
7           A           15

df[df['amount type'].eq(df['amount type'].mode().loc[0])]

  description  amount type
2           C         3000
3           C         3000
4           C         3000

Explaination:

df[mask] # will slice the dataframe based on boolean series (select the True rows) which is called a mask
df['amount type'].eq(3000) # .eq() stands for equal, it is a synonym for == in pandas
df['amount type'].mode() # is the mode of multiple values, which is defined as the most common
df['amount type'].loc[0] # returns the result with index 0, to get int instead of series

Upvotes: 1

Related Questions