Reputation: 287
EDITED
I want to add a new column called prev_message_left which counts the no. of messages_left per ID less than the date prior the given time. Basically I want to have a column which says how many times we had left message on call to that customer prior to the current time and date. This is how my data frame looks like
date ID call_time message_left
20191101 1 8:00 0
20191102 2 9:00 1
20191030 1 16:00 1
20191103 2 10:30 1
20191105 2 14:00 0
20191030 1 15:30 0
I want to add an additional column called prev_message_left_count
date ID call_time message_left prev_message_left_count
20191101 1 8:00 0 1
20191102 2 9:00 1 0
20191030 1 16:00 1 0
20191103 2 10:30 1 1
20191105 2 14:00 0 2
20191030 1 15:30 0 0
My dataframe has 15 columns and 90k rows.
I have various other columns in this dataframe and there are columns like 'No Message Left', 'Responded' for which I will have to compute additional columns called 'Previous_no_message_left' and 'prev_responded' similar to 'prev_message_left'
Upvotes: 2
Views: 64
Reputation: 30920
Use DataFrame.sort_values
to get the cumulative sum in the correct order by groups. You can create groups using DataFrame.groupby
:
df['prev_message_left_count']= (df.sort_values(['date','call_time'])
.groupby('ID')['message_left']
.apply(lambda x: x.shift(fill_value=0)
.cumsum()) )
print(df)
date ID call_time message_left prev_message_left_count
0 20191101 1 8:00 0 1
1 20191102 2 9:00 1 0
2 20191030 1 16:00 1 0
3 20191103 2 10:30 1 1
4 20191105 2 14:00 0 2
5 20191030 1 15:30 0 0
sometimes GroupBy.apply
is slow so it may be advisable
df['prev_message_left_count']=( df.sort_values(['date','call_time'])
.groupby('ID')
.shift(fill_value=0)
.groupby(df['ID'])['message_left']
.cumsum()
Upvotes: 1