Reputation: 89
I have a dataframe like this
df
order_date amount
0 2015-10-02 1
1 2015-12-21 15
2 2015-12-24 3
3 2015-12-26 4
4 2015-12-27 5
5 2015-12-28 10
I would like to sum on df["amount"] based on range from df["order_date"] to df["order_date"] + 6 days
order_date amount sum
0 2015-10-02 1 1
1 2015-12-21 15 27 //comes from 15 + 3 + 4 + 5
2 2015-12-24 3 22 //comes from 3 + 4 + 5 + 10
3 2015-12-26 4 19
4 2015-12-27 5 15
5 2015-12-28 10 10
the data type of order_date is datetime have tried to use iloc but it did not work well... if anyone has any idea/example on who to work on this, please kindly let me know.
Upvotes: 6
Views: 3459
Reputation: 6114
If pandas rolling allowed left-aligned window (default is right-aligned) then the answer would be a simple single liner: df.set_index('order_date').amount.rolling('7d',min_periods=1,align='left').sum()
, however forward-looking has not been implemented yet (i.e. rolling
does not accept an align
parameter). So, the trick I came up with is to "reverse" the dates temporarily. Solution:
df.index = pd.to_datetime(pd.datetime.now() - df.order_date)
df['sum'] = df.sort_index().amount.rolling('7d',min_periods=1).sum()
df.reset_index(drop=True)
Output:
order_date amount sum
0 2015-10-02 1 1.0
1 2015-12-21 15 27.0
2 2015-12-24 3 22.0
3 2015-12-26 4 19.0
4 2015-12-27 5 15.0
5 2015-12-28 10 10.0
Upvotes: 3
Reputation: 12515
import datetime
df['order_date'] = pd.to_datetime(df['order_date'], format='%Y-%m-%d')
df.set_index(['order_date'], inplace=True)
# Sum rows within the range of six days in the future
d = {t: df[(df.index >= t) & (df.index <= t + datetime.timedelta(days=6))]['amount'].sum()
for t in df.index}
# Assign the summed values back to the dataframe
df['amount_sum'] = [d[t] for t in df.index]
df
is now:
amount amount_sum
order_date
2015-10-02 1.0 1.0
2015-12-21 15.0 27.0
2015-12-24 3.0 22.0
2015-12-26 4.0 19.0
2015-12-27 5.0 15.0
2015-12-28 10.0 10.0
Upvotes: 0
Reputation: 6556
Set order_date
to be DatetimeIndex
, so that you can use df.ix[time1:time2]
to get the time range rows, then filter amount
column and sum them.
You can try with :
from datetime import timedelta
df = pd.read_fwf('test2.csv')
df.order_date = pd.to_datetime(df.order_date)
df =df.set_index(pd.DatetimeIndex(df['order_date']))
sum_list = list()
for i in range(len(df)):
sum_list.append(df.ix[df.ix[i]['order_date']:(df.ix[i]['order_date'] + timedelta(days=6))]['amount'].sum())
df['sum'] = sum_list
df
Output:
order_date amount sum
2015-10-02 2015-10-02 1 1
2015-12-21 2015-12-21 15 27
2015-12-24 2015-12-24 3 22
2015-12-26 2015-12-26 4 19
2015-12-27 2015-12-27 5 15
2015-12-28 2015-12-28 10 10
Upvotes: 0
Reputation: 323226
There is my way for this problem. It works.. (I believe there should be a much better way to do this.)
import pandas as pd
df['order_date']=pd.to_datetime(pd.Series(df.order_date))
Temp=pd.DataFrame(pd.date_range(start='2015-10-02', end='2017-01-01'),columns=['STDate'])
Temp=Temp.merge(df,left_on='STDate',right_on='order_date',how='left')
Temp['amount']=Temp['amount'].fillna(0)
Temp.sort(['STDate'],ascending=False,inplace=True)
Temp['rolls']=pd.rolling_sum(Temp['amount'],window =7,min_periods=0)
Temp.loc[Temp.STDate.isin(df.order_date),:].sort(['STDate'],ascending=True)
STDate Unnamed: 0 order_date amount rolls
0 2015-10-02 0.0 2015-10-02 1.0 1.0
80 2015-12-21 1.0 2015-12-21 15.0 27.0
83 2015-12-24 2.0 2015-12-24 3.0 22.0
85 2015-12-26 3.0 2015-12-26 4.0 19.0
86 2015-12-27 4.0 2015-12-27 5.0 15.0
87 2015-12-28 5.0 2015-12-28 10.0 10.0
Upvotes: 0
Reputation: 398
Expanding on my comment:
from datetime import timedelta
df['sum'] = 0
for i in range(len(df)):
dt1 = df['order_date'][i]
dt2 = dt1 + timedelta(days=6)
df['sum'][i] = sum(df['amount'][(df['order_date'] >= dt1) & (df['order_date'] <= dt2)])
There's probably a much better way to do this but it works...
Upvotes: 0