cvax
cvax

Reputation: 436

Apply function for counting length of a DataFrame filter

What's the best way to create a new pandas column with the length of filtering of another df based on a value from the first df?

df_account has account numbers

df_retention has rows for each date an account numbers was active

I am trying to create a new column on df_account that has the total number of days the account was active. Using .apply seems extremely slow.

def retention_count(x):
    return len(df_retention[df_retention['account'] == x])

df_account['retention_total'] = df_account['account'].apply(retention_count)

On a small number of rows, this works, but when my df_account has over 750k rows it is really slow. What can I do to make this faster? Thanks.

Upvotes: 1

Views: 78

Answers (1)

Alex Zisman
Alex Zisman

Reputation: 411

You could use groupby and count the rows in the df_retention dataframe. Assuming account is your index on df_account

df_account.set_index('account',inplace=True)
df_account['retention_total'] = df_retention.groupby('account').count()

Upvotes: 1

Related Questions