Reputation: 436
What's the best way to create a new pandas column with the length of filtering of another df based on a value from the first df?
df_account
has account numbers
df_retention
has rows for each date an account numbers was active
I am trying to create a new column on df_account
that has the total number of days the account was active. Using .apply
seems extremely slow.
def retention_count(x):
return len(df_retention[df_retention['account'] == x])
df_account['retention_total'] = df_account['account'].apply(retention_count)
On a small number of rows, this works, but when my df_account
has over 750k rows it is really slow. What can I do to make this faster? Thanks.
Upvotes: 1
Views: 78
Reputation: 411
You could use groupby and count the rows in the df_retention dataframe. Assuming account
is your index on df_account
df_account.set_index('account',inplace=True)
df_account['retention_total'] = df_retention.groupby('account').count()
Upvotes: 1