Reputation: 11765
Suppose I download daily stock price data using pandas:
df = web.DataReader('YHOO', 'yahoo', '1/1/2004', '1/31/2004')['Close']
I want closing prices on the first trading day of each week. This is typically Monday, but will be Tuesday if Monday is a holiday.
So I try resampling:
df.resample('W-MON', how='last')
and get this:
Date
2004-01-05 46.900002
2004-01-12 49.740002
2004-01-19 48.110001
2004-01-26 48.160000
2004-02-02 46.980000
Pandas is very cleverly giving me data with only Mondays. But 2004-01-19
was a holiday, and is actually not in df
:
Date
2004-01-02 45.400002
2004-01-05 46.900002
...
2004-01-15 48.090000
2004-01-16 48.110001
2004-01-20 47.660000
2004-01-21 47.380001
...
Is there a way to have it give the first date each week from the dataframe that is on or after a Monday? (So, the result above, except with 2004-01-19
replaced with 2004-01-20
.)
Upvotes: 4
Views: 1839
Reputation: 880887
If we choose to move missing dates such as 2004-01-19
back to the last date within the resampling time period (e.g. 2004-01-16
), then that could be accomplished by applying the resampling
not only to the closing prices, but also to the Date
itself:
import pandas as pd
import pandas.io.data as pdata
df = pdata.get_data_yahoo('YHOO', start='2004-1-1', end='2004-1-31')
df['Date'] = df.index
df = df[['Close', 'Date']]
result = df.resample('W-MON', how='last')
result = result.set_index('Date')
yields
Close
Date
2004-01-05 46.900002
2004-01-12 49.740002
2004-01-16 48.110001
2004-01-26 48.160000
2004-01-30 46.980000
Since df
has a Date
column, df.resample('W-MON', how='last')
finds both the last price and the last Date in each resampling group.
To solve the original problem, where missing dates such as 2004-01-19
are replaced by the next date present in df
, you could use
In [343]: df.index.searchsorted(result.index)
Out[343]: array([ 1, 6, 11, 15, 20])
to find the ordinal index where the dates in result.index
would "fit" into df.index
so as to maintain sorted order. These indices tell us the date in df.index
which comes on or after the corresponding date in result.index
:
In [349]: df.iloc[[1,6,11,15]].index
Out[349]: DatetimeIndex(['2004-01-05', '2004-01-12', '2004-01-20', '2004-01-26'], dtype='datetime64[ns]', name=u'Date', freq=None)
Then use these ordinal indices to reassign dates from df.index
back into result.index
:
import pandas as pd
import pandas.io.data as pdata
df = pdata.get_data_yahoo('YHOO', start='2004-1-1', end='2004-1-31')['Close']
result = df.resample('W-MON', how='last')
idx = df.index.searchsorted(result.index)
# np.clip reduces the index by 1 if a date in result.index comes after all dates in df.index
idx = np.clip(idx, 0, len(df)-1)
result.index = df.iloc[idx].index
yields
Date
2004-01-05 46.900002
2004-01-12 49.740002
2004-01-20 48.110001
2004-01-26 48.160000
2004-01-30 46.980000
Name: Close, dtype: float64
Note that the last date gets move to 2004-01-30
because 2004-02-02
is also not in df.index
and the last available date in df.index
is 2004-01-30
.
Upvotes: 3
Reputation: 37938
You could just query for any day-of-week that is less than the previous date:
weekdays = np.array(map(lambda x: x.weekday(), df.index))
df[weekdays < np.roll(weekdays, 1)]
This works because the weekday
starts at 0
for Monday and increments from there. So the first day of the week will have a weekday
that is less than the previous date.
Upvotes: 0