minnymate
minnymate

Reputation: 55

Pandas: get the cumulative sum of a column only if the timestamp is greater than that of another column

For each customer, I would like to get the cumulative sum of a column (Dollar Value) only when Timestamp 1 is less than Timestamp 2. I could do a cartesian join of the values based on Customer or iterate through the dataframe, but wanted to see if there was an easier way to to do this maybe with a groupby and apply.

DataFrame:

df = pd.DataFrame({'Customer': ['Alice','Brian','Alice','Alice','Alice','Brian', 'Brian'], 'Timestamp': [1,2,3,4,5,3,6], 'Timestamp 2': [2,5,4,6,7,5,7], 'Dollar Value':[0,1,3,5,3,2,3]})

Sort Values:

df = df.sort_values(['Customer','Timestamp'])

Expected Results:

df['Desired_result'] = [0,0,0,3,0,0,3]

DataFrame

Upvotes: 2

Views: 1189

Answers (2)

Dev Khadka
Dev Khadka

Reputation: 5451

This could work

get the rows where the condition match and then do cumsum

cond = df["Timestamp"]>df["Timestamp 2"]
df["Dollar Value"].where(cond, 0).groupby([cond, df["Customer"]]).cumsum()

Edit based on you comment may be this is what you want

df = pd.DataFrame({'Customer': ['Alice','Brian','Alice','Alice','Alice','Brian', 'Brian'], 'Timestamp': [1,2,3,4,5,3,6], 'Timestamp 2': [2,5,4,6,7,5,7], 'Dollar Value':[0,1,3,5,3,2,3]})

def sum_dollar_value(group):
    group = group.copy()
    last_row = group.iloc[-1, :]
    cond = group["Timestamp 2"]<last_row["Timestamp"]
    group.loc[last_row.name, "result"] = np.sum(group["Dollar Value"].where(cond, 0))
    return group

df.groupby("Customer").apply(sum_dollar_value).reset_index(level=0, drop=True)

Upvotes: 1

Yaakov Bressler
Yaakov Bressler

Reputation: 12038

I would suggest setting a condition, then grouping by customer:

# set condition
cond = df["Timestamp"]<df["Timestamp 2"]
df[cond].groupby('Customer')['Dollar Value'].sum()

Note: I borrowed the syntax of condition from the previous answer by Dev Khadka

Upvotes: 0

Related Questions