MLnoob
MLnoob

Reputation: 161

How to calculate mode over two columns in a python dataframe?

There are two columns in my csv: FirstName and LastName. I need to find the most common full name. Eg:

FirstName      LastName  
A                 X  
A                 P  
A                 Y  
A                 Z                   
B                 X  
B                 Z  
C                 X  
C                 W  
C                 W  

I have tried using the mode function:

df["FirstName"].mode()[0]  
df["LastName"].mode()[0]  

But it wont work over two columns

The mode of each columns are :

FirstName : A - occurs 4 times
LastName : X - occurs 3 times

But the output should be "C W". As this is the full name that occur most times.

Upvotes: 0

Views: 3141

Answers (3)

Sreeram TP
Sreeram TP

Reputation: 11907

You can do,

(df['FirstName'] + df['LastName']).mode()[0]

# Output : 'CW'

If you really need space in between first and last names you can concatenate ' ' like this,

(df['FirstName'] + ' ' + df['LastName']).mode()[0]
# Output : 'C W'

Upvotes: 3

anon01
anon01

Reputation: 11161

You can concatenate those into a single string with:

full_names = df.FirstName + df.LastName
full_names.mode()[0]

Upvotes: 0

Vaishali
Vaishali

Reputation: 38415

You can combine the columns and find mode,

df.apply(tuple, 1).mode()[0]

('C', 'W')

Upvotes: 3

Related Questions