Reputation: 55
For each customer, I would like to get the cumulative sum of a column (Dollar Value) only when Timestamp 1 is less than Timestamp 2. I could do a cartesian join of the values based on Customer or iterate through the dataframe, but wanted to see if there was an easier way to to do this maybe with a groupby and apply.
DataFrame:
df = pd.DataFrame({'Customer': ['Alice','Brian','Alice','Alice','Alice','Brian', 'Brian'], 'Timestamp': [1,2,3,4,5,3,6], 'Timestamp 2': [2,5,4,6,7,5,7], 'Dollar Value':[0,1,3,5,3,2,3]})
Sort Values:
df = df.sort_values(['Customer','Timestamp'])
Expected Results:
df['Desired_result'] = [0,0,0,3,0,0,3]
Upvotes: 2
Views: 1189
Reputation: 5451
This could work
get the rows where the condition match and then do cumsum
cond = df["Timestamp"]>df["Timestamp 2"]
df["Dollar Value"].where(cond, 0).groupby([cond, df["Customer"]]).cumsum()
Edit based on you comment may be this is what you want
df = pd.DataFrame({'Customer': ['Alice','Brian','Alice','Alice','Alice','Brian', 'Brian'], 'Timestamp': [1,2,3,4,5,3,6], 'Timestamp 2': [2,5,4,6,7,5,7], 'Dollar Value':[0,1,3,5,3,2,3]})
def sum_dollar_value(group):
group = group.copy()
last_row = group.iloc[-1, :]
cond = group["Timestamp 2"]<last_row["Timestamp"]
group.loc[last_row.name, "result"] = np.sum(group["Dollar Value"].where(cond, 0))
return group
df.groupby("Customer").apply(sum_dollar_value).reset_index(level=0, drop=True)
Upvotes: 1
Reputation: 12038
I would suggest setting a condition, then grouping by customer:
# set condition
cond = df["Timestamp"]<df["Timestamp 2"]
df[cond].groupby('Customer')['Dollar Value'].sum()
Note: I borrowed the syntax of condition from the previous answer by Dev Khadka
Upvotes: 0