Reputation: 1945
I have the following dataframe:
print(df)
day month year quantity
6 04 2018 10
8 04 2018 8
12 04 2018 8
I would like to create a column, sum of the "quantity" over the next "n" days, as it follows:
n = 2
print(df1)
day month year quantity final_quantity
6 04 2018 10 10 + 0 + 8 = 18
8 04 2018 8 8 + 0 + 0 = 8
12 04 2018 8 8 + 0 + 0 = 8
Specifically, summing 0 if the product has not been sold in the next "n" days. I tried rolling sums from Pandas, but does not seem to work on different columns:
n = 2
df.quantity[::-1].rolling(n + 1, min_periods=1).sum()[::-1]
Upvotes: 1
Views: 156
Reputation: 153460
You can use set_index
and rolling
with sum
:
df_out = df.set_index(pd.to_datetime(df['month'].astype(str)+
df['day'].astype(str)+
df['year'].astype(str), format='%m%d%Y'))['quantity']
d1 = df_out.resample('D').asfreq(fill_value=0)
d2 = d1[::-1].reset_index()
df['final_quantity'] = d2['quantity'].rolling(3, min_periods=1).sum()[::-1].to_frame()\
.set_index(d1.index)\
.reindex(df_out.index).values
Output:
day month year quantity final_quantity
0 6 4 2018 10 18.0
1 8 4 2018 8 8.0
2 12 4 2018 8 8.0
Upvotes: 1
Reputation: 164703
You can use a list comprehension:
import pandas as pd
df['DateTime'] = pd.to_datetime(df[['year', 'month', 'day']])
df['final_quantity'] = [df.loc[df['DateTime'].between(d, d+pd.Timedelta(days=2)), 'quantity'].sum() \
for d in df['DateTime']]
print(df)
# day month year quantity DateTime final_quantity
# 0 6 4 2018 10 2018-04-06 18
# 1 8 4 2018 8 2018-04-08 8
# 2 12 4 2018 8 2018-04-12 8
Upvotes: 1