Reputation: 153
I have a Pandas dataframe that has columns actor_id and account_id. Actor is a person and account is simply an account. So a person can have more than one account and accounts can have multiple people.
My goal is to group by actor_id and then rank the actor_ids by the number of accounts they have so that I can get a list of the Top 10 actors with the most accounts.
In SQL, it would be something like SELECT actor_id, account_id, COUNT(account_id) GROUP BY actor_id LIMIT 10. But I am Trying to do this in Python.
I referenced this Pandas group and sort by index count but it did not work for me. Below is the code I've tried.
df['count'] = df['actor_id'].map(df['account_id'].value_counts())
df.sort_index('count', ascending=False)
In the picture, replace project_id with account_id.
Upvotes: 1
Views: 3561
Reputation: 29635
You can do:
df_nb_acc = (
df.groupby('actor_id')['account_id'] #groupby actor_id, select the column account_id
.count() # count the number of accout per actor
.reset_index() # actor_id become a column and not indexes
.rename(columns={'account_id':'Nb_account'}) # to rename the column
.sort_values('Nb_account',axis=1, ascending=False)
# to sort the value on the column Nb_account, largest to smallest
)
And to get the top 10 then do df_nb_acc.head(10)
Upvotes: 2