Pandas: get the cumulative sum of a column only if the timestamp is greater than that of another column

Question

For each customer, I would like to get the cumulative sum of a column (Dollar Value) only when Timestamp 1 is less than Timestamp 2. I could do a cartesian join of the values based on Customer or iterate through the dataframe, but wanted to see if there was an easier way to to do this maybe with a groupby and apply.

DataFrame:

df = pd.DataFrame({'Customer': ['Alice','Brian','Alice','Alice','Alice','Brian', 'Brian'], 'Timestamp': [1,2,3,4,5,3,6], 'Timestamp 2': [2,5,4,6,7,5,7], 'Dollar Value':[0,1,3,5,3,2,3]})

Sort Values:

df = df.sort_values(['Customer','Timestamp'])

Expected Results:

df['Desired_result'] = [0,0,0,3,0,0,3]

DataFrame

Dev Khadka · Accepted Answer

This could work

get the rows where the condition match and then do cumsum

cond = df["Timestamp"]>df["Timestamp 2"]
df["Dollar Value"].where(cond, 0).groupby([cond, df["Customer"]]).cumsum()

Edit based on you comment may be this is what you want

df = pd.DataFrame({'Customer': ['Alice','Brian','Alice','Alice','Alice','Brian', 'Brian'], 'Timestamp': [1,2,3,4,5,3,6], 'Timestamp 2': [2,5,4,6,7,5,7], 'Dollar Value':[0,1,3,5,3,2,3]})

def sum_dollar_value(group):
    group = group.copy()
    last_row = group.iloc[-1, :]
    cond = group["Timestamp 2"]

Pandas: get the cumulative sum of a column only if the timestamp is greater than that of another column

Answers (2)

Related Questions