Reputation: 1
I need to find for each value in column 'country' the sport with the highest number of athletes. My data frame has columns 'country', 'sport', 'name'.
Tried using .groupby methods but cant figure it out
tried df.groupby('country', 'sport').count() this gives me the number of athletes in each sport grouped by country, but I am not sure how to find the biggest sport for each country from here.
Any help would be great :)
Upvotes: 0
Views: 487
Reputation: 120439
Your code:
df.groupby('country', 'sport').count()
is equivalent to:
df.value_counts(['country', 'sport'])
So from here, you can group the output by 'country' and get the first row of each group because data are already sorted in descending order by default:
As suggested in comments, you have to setup a MRE to get help *:
# For example
import pandas as pd
import numpy as np
import string
countries = ['France', 'Spain', 'Italy']
sports = ['Football', 'Tennis', 'Basketball']
names = list(string.ascii_uppercase)
N = len(names)
df = pd.DataFrame({'country': np.random.choice(countries, N+1),
'sport': np.random.choice(sports, N+1),
'name': names})
>>> df.value_counts(['country', 'sport']).groupby('country').head(1)
country sport
France Football 5
Italy Basketball 4
Spain Tennis 3
dtype: int64
* If you have already some data to share, a simpler MRE is to copy/paste the output of print(df.head(20).to_string(index=False))
if the sample is relevant. With a such output, we can use pd.read_clipboard()
to load the data in a DataFrame.
Upvotes: 1