azuber
azuber

Reputation: 409

aggregate under certain condition

I have this data frame.

df = pd.DataFrame({'day':[1,2,1,4,2,3], 'user':['A','B','B','B','A','A'],
                   'num_posts':[1,2,3,4,5,6]})

I want a new column containing the total number of posts for that user to date of that post excluding that day. What I want looks like this:

user day num_post total_todate  
A     1     1          0  
B     2     2          3  
B     1     3          0  
B     4     4          5  
A     2     5          1  
A     3     6          6  

Any ideas?

Upvotes: 1

Views: 40

Answers (1)

akuiper
akuiper

Reputation: 214977

You can sort data frame by day, group by user, calculate the cumulative sum of num_posts column and then shift it down by 1:

df['total_todate'] = (df.sort_values('day').groupby('user').num_posts
                        .transform(
                             lambda p: p.cumsum().shift()
                         ).fillna(0))

df
#   day  num_posts user  total_todate
#0    1          1    A           0.0
#1    2          2    B           3.0
#2    1          3    B           0.0
#3    4          4    B           5.0
#4    2          5    A           1.0
#5    3          6    A           6.0

Upvotes: 2

Related Questions