Reputation: 409
I have this data frame.
df = pd.DataFrame({'day':[1,2,1,4,2,3], 'user':['A','B','B','B','A','A'],
'num_posts':[1,2,3,4,5,6]})
I want a new column containing the total number of posts for that user to date of that post excluding that day. What I want looks like this:
user day num_post total_todate
A 1 1 0
B 2 2 3
B 1 3 0
B 4 4 5
A 2 5 1
A 3 6 6
Any ideas?
Upvotes: 1
Views: 40
Reputation: 214977
You can sort data frame by day
, group by user
, calculate the cumulative sum of num_posts
column and then shift it down by 1:
df['total_todate'] = (df.sort_values('day').groupby('user').num_posts
.transform(
lambda p: p.cumsum().shift()
).fillna(0))
df
# day num_posts user total_todate
#0 1 1 A 0.0
#1 2 2 B 3.0
#2 1 3 B 0.0
#3 4 4 B 5.0
#4 2 5 A 1.0
#5 3 6 A 6.0
Upvotes: 2