Dropping rows of DataFrame: Faster alternative for groupby.apply

Question

I've got a sorted (by "customer") DataFrame which looks like this:

import pandas as pd
import numpy as np

df = pd.DataFrame({"customer": [1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3], 
                   "revenue": [np.nan, np.nan, 4.23, np.nan, 5.1, 523, np.nan, 4.2, np.nan, 23, 
                               np.nan, np.nan, np.nan, np.nan]})

My goal is to get rid of the last row(s) of a customer if it's not a row with a positive revenue. So, if for a specific customer no row with a positive revenue is following then the row has to be deleted. In our example this would be the last row of customer 1 and the last two rows of customer 2.

In addition, I'd like to get rid of all customers which didn't buy at all. In our example, this would be customer 3. In the following picture I highlighted the rows which shall be deleted:

At the moment I'm using the following syntax to accomplish this task:

def deleting(sub_df):
    # Delete customers that didn't buy
    if sub_df["revenue"].sum() == 0:
        return None

    # Delte all rows after the last purchase
    sub_df.loc[:, "dropping"] = sub_df.loc[:, "revenue"].fillna(method="bfill")
    sub_df = sub_df.loc[~sub_df["dropping"].isnull(), :]
    sub_df = sub_df.drop("dropping", axis="columns")
    return sub_df

df.groupby("customer").apply(deleting).reset_index(drop=True)

I feel that there has to be a better solution since it feels weired to use a groupby to delete rows of a dataframe. Also, I need a faster solution since my dataframe has 1.8 million rows. So, performance is an issue.

Thanks for your help!

Quang Hoang · Accepted Answer

You can use bfill on groupby:

df[df.groupby('customer').revenue.bfill().notnull()]

Output:

0         1      NaN
1         1      NaN
2         1     4.23
3         1      NaN
4         1     5.10
5         1   523.00
7         2     4.20
8         2      NaN
9         2    23.00

Dropping rows of DataFrame: Faster alternative for groupby.apply

Answers (1)

Related Questions