Reputation: 93
I have a pandas date-frame df
with the index being a daily DatetimeIndex
, and an attached column with historical_sales
.
If we wanted to filter days in the past where historical_sales were greater than a large number, say 200
, easy enough:
df.loc[df['historical_sales'>200]]
I'm wondering however, what if we wanted to explore sales patterns on 5 days preceding AND following days when sales were > 200?
Many thanks.
Upvotes: 1
Views: 365
Reputation: 862681
I think youn need get all index values by list comprehension and then select by loc
.
Also is necessary use numpy.concatenate
for join all indexes together with numpy.unique
for remove duplicates.
np.random.seed(100)
rng = pd.date_range('2017-04-03', periods=20)
df = pd.DataFrame({'historical_sales': np.random.choice([100,200,300], size=20)}, index=rng)
print (df)
historical_sales
2017-04-03 100
2017-04-04 100
2017-04-05 100
2017-04-06 300
2017-04-07 300
2017-04-08 100
2017-04-09 300
2017-04-10 200
2017-04-11 300
2017-04-12 300
2017-04-13 300
2017-04-14 300
2017-04-15 200
2017-04-16 100
2017-04-17 100
2017-04-18 100
2017-04-19 100
2017-04-20 300
2017-04-21 100
2017-04-22 200
idxmask = df.index[df['historical_sales']>200]
print (idxmask)
DatetimeIndex(['2017-04-06', '2017-04-07', '2017-04-09', '2017-04-11',
'2017-04-12', '2017-04-13', '2017-04-14', '2017-04-20'],
dtype='datetime64[ns]', freq=None)
#in real data change 1 to 5 for 5 days
temp_index = [df.loc[timestamp - pd.Timedelta(1, unit='d') :
timestamp + pd.Timedelta(1, unit='d')].index for timestamp in idxmask]
idx = np.unique(np.concatenate(temp_index))
df1 = df.loc[idx]
print (df1)
historical_sales
2017-04-05 100
2017-04-06 300
2017-04-07 300
2017-04-08 100
2017-04-09 300
2017-04-10 200
2017-04-11 300
2017-04-12 300
2017-04-13 300
2017-04-14 300
2017-04-15 200
2017-04-19 100
2017-04-20 300
2017-04-21 100
Upvotes: 1
Reputation: 3417
For the sake of clarity, I am adding a new
column which is set to 1 for the row of interest. And for the easy verification the number of window dates have been kept to 1 instead of 5, in the code below
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
df = pd.DataFrame(data=np.random.rand(51),index=pd.date_range('2015-04-20','2015-06-09'),columns=['A'])
idx = df[df.A >0.5].index
df["new"] = 0
for date in idx:
current_date = date.to_pydatetime()
start = current_date - timedelta(days=1)
end = current_date + timedelta(days=1)
df.loc[start:current_date]["new"] = 1
df.loc[current_date:end]["new"] = 1
print(df)
Upvotes: 0
Reputation: 111
When I need to work with rows before and after, I simply perform a shift.
df['preceeding_5th_day'] = df['historical_sales'].shift(5)
df['following_5th_day'] = df['historical_sales'].shift(-5)
Then, you can simply make your check and do
df.loc[df['historical_sales'>200]]
The selected rows then will also have columns about preceeding and following 5th day. Pretty straightforward this way.
Upvotes: 0
Reputation: 10899
You would want to do range slicing: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-position
Should look like this (code is pseudocode):
great_sales_df = df.loc[df['historical_sales'>200]]
for sale in great_sales_df:
sales_date = great_sales_df["date"]
sales_before = sales_date + pd.DateOffset(-5)
sales_after = sales_date + pd.DateOffset(+5)
pattern_df = df.iloc[sales_before:sales_after]
This code will not work, but I think the direction is right.
Upvotes: 0