JDAR
JDAR

Reputation: 1

Python: How to show only the sport with the highest number of athletes by country?

I need to find for each value in column 'country' the sport with the highest number of athletes. My data frame has columns 'country', 'sport', 'name'.

Tried using .groupby methods but cant figure it out

tried df.groupby('country', 'sport').count() this gives me the number of athletes in each sport grouped by country, but I am not sure how to find the biggest sport for each country from here.

Any help would be great :)

Upvotes: 0

Views: 487

Answers (1)

Corralien
Corralien

Reputation: 120439

Your code:

df.groupby('country', 'sport').count()

is equivalent to:

df.value_counts(['country', 'sport'])

So from here, you can group the output by 'country' and get the first row of each group because data are already sorted in descending order by default:

As suggested in comments, you have to setup a MRE to get help *:

# For example
import pandas as pd
import numpy as np
import string

countries = ['France', 'Spain', 'Italy']
sports = ['Football', 'Tennis', 'Basketball']
names = list(string.ascii_uppercase)
N = len(names)

df = pd.DataFrame({'country': np.random.choice(countries, N+1),
                   'sport': np.random.choice(sports, N+1), 
                   'name': names})
>>> df.value_counts(['country', 'sport']).groupby('country').head(1)
country  sport     
France   Football      5
Italy    Basketball    4
Spain    Tennis        3
dtype: int64

* If you have already some data to share, a simpler MRE is to copy/paste the output of print(df.head(20).to_string(index=False)) if the sample is relevant. With a such output, we can use pd.read_clipboard() to load the data in a DataFrame.

Upvotes: 1

Related Questions