Apply function for counting length of a DataFrame filter

Question

What's the best way to create a new pandas column with the length of filtering of another df based on a value from the first df?

df_account has account numbers

df_retention has rows for each date an account numbers was active

I am trying to create a new column on df_account that has the total number of days the account was active. Using .apply seems extremely slow.

def retention_count(x):
    return len(df_retention[df_retention['account'] == x])

df_account['retention_total'] = df_account['account'].apply(retention_count)

On a small number of rows, this works, but when my df_account has over 750k rows it is really slow. What can I do to make this faster? Thanks.

Alex Zisman · Accepted Answer

You could use groupby and count the rows in the df_retention dataframe. Assuming account is your index on df_account

df_account.set_index('account',inplace=True)
df_account['retention_total'] = df_retention.groupby('account').count()

Apply function for counting length of a DataFrame filter

Answers (1)

Related Questions