Reputation: 3991
Say I've got the following DataFrame
:
In [46]: timestamp = pd.date_range(start='1/1/2018', end='1/2/2018', freq='T')
In [47]: df = pd.DataFrame(timestamp, columns=['timestamp'])
In [48]: df['user'] = np.random.randint(10, size=len(timestamp))
In [49]: df['clicked'] = np.random.randint(2, size=len(timestamp))
Where each row represent whether a user
has clicked
in a given timestamp
.
I'd like to add a new column, where the value for each row will be the number of times the user of the current row has clicked
(i.e has value of 1
) in the past hour from the current row timestamp
.
How can this be done?
Upvotes: 1
Views: 70
Reputation: 6166
Try:
df = df.set_index('timestamp')
hour_count = df.groupby(['user'])['clicked'].rolling('1H').sum()
df =df.assign(rolling_sum=hour_count.reset_index(level=0, drop=True)).reset_index()
Upvotes: 2