kaecvtionr
kaecvtionr

Reputation: 153

How do you group, sort, and limit in Python Pandas? (i.e. Get Top 10)

I have a Pandas dataframe that has columns actor_id and account_id. Actor is a person and account is simply an account. So a person can have more than one account and accounts can have multiple people.

My goal is to group by actor_id and then rank the actor_ids by the number of accounts they have so that I can get a list of the Top 10 actors with the most accounts.

In SQL, it would be something like SELECT actor_id, account_id, COUNT(account_id) GROUP BY actor_id LIMIT 10. But I am Trying to do this in Python.

I referenced this Pandas group and sort by index count but it did not work for me. Below is the code I've tried.

df['count'] = df['actor_id'].map(df['account_id'].value_counts())
df.sort_index('count', ascending=False)

The dataset looks like: Dataset

In the picture, replace project_id with account_id.

Upvotes: 1

Views: 3561

Answers (1)

Ben.T
Ben.T

Reputation: 29635

You can do:

df_nb_acc = (
    df.groupby('actor_id')['account_id'] #groupby actor_id, select the column account_id
      .count() # count the number of accout per actor
      .reset_index() # actor_id become a column and not indexes 
      .rename(columns={'account_id':'Nb_account'}) # to rename the column
      .sort_values('Nb_account',axis=1, ascending=False)
      # to sort the value on the column Nb_account, largest to smallest
    )

And to get the top 10 then do df_nb_acc.head(10)

Upvotes: 2

Related Questions