itzy
itzy

Reputation: 11765

Resample in pandas without changing date

Suppose I download daily stock price data using pandas:

df = web.DataReader('YHOO', 'yahoo', '1/1/2004', '1/31/2004')['Close']

I want closing prices on the first trading day of each week. This is typically Monday, but will be Tuesday if Monday is a holiday.

So I try resampling:

df.resample('W-MON', how='last')

and get this:

Date
2004-01-05    46.900002
2004-01-12    49.740002
2004-01-19    48.110001
2004-01-26    48.160000
2004-02-02    46.980000

Pandas is very cleverly giving me data with only Mondays. But 2004-01-19 was a holiday, and is actually not in df:

Date
2004-01-02    45.400002
2004-01-05    46.900002
...
2004-01-15    48.090000
2004-01-16    48.110001
2004-01-20    47.660000
2004-01-21    47.380001
...

Is there a way to have it give the first date each week from the dataframe that is on or after a Monday? (So, the result above, except with 2004-01-19 replaced with 2004-01-20.)

Upvotes: 4

Views: 1839

Answers (2)

unutbu
unutbu

Reputation: 880887

If we choose to move missing dates such as 2004-01-19 back to the last date within the resampling time period (e.g. 2004-01-16), then that could be accomplished by applying the resampling not only to the closing prices, but also to the Date itself:

import pandas as pd
import pandas.io.data as pdata

df = pdata.get_data_yahoo('YHOO', start='2004-1-1', end='2004-1-31')
df['Date'] = df.index
df = df[['Close', 'Date']]

result = df.resample('W-MON', how='last')
result = result.set_index('Date')

yields

                Close
Date                 
2004-01-05  46.900002
2004-01-12  49.740002
2004-01-16  48.110001
2004-01-26  48.160000
2004-01-30  46.980000

Since df has a Date column, df.resample('W-MON', how='last') finds both the last price and the last Date in each resampling group.


To solve the original problem, where missing dates such as 2004-01-19 are replaced by the next date present in df, you could use

In [343]: df.index.searchsorted(result.index)
Out[343]: array([ 1,  6, 11, 15, 20])

to find the ordinal index where the dates in result.index would "fit" into df.index so as to maintain sorted order. These indices tell us the date in df.index which comes on or after the corresponding date in result.index:

In [349]: df.iloc[[1,6,11,15]].index
Out[349]: DatetimeIndex(['2004-01-05', '2004-01-12', '2004-01-20', '2004-01-26'], dtype='datetime64[ns]', name=u'Date', freq=None)

Then use these ordinal indices to reassign dates from df.index back into result.index:

import pandas as pd
import pandas.io.data as pdata

df = pdata.get_data_yahoo('YHOO', start='2004-1-1', end='2004-1-31')['Close']

result = df.resample('W-MON', how='last')
idx = df.index.searchsorted(result.index)
# np.clip reduces the index by 1 if a date in result.index comes after all dates in df.index
idx = np.clip(idx, 0, len(df)-1)
result.index = df.iloc[idx].index

yields

Date
2004-01-05    46.900002
2004-01-12    49.740002
2004-01-20    48.110001
2004-01-26    48.160000
2004-01-30    46.980000
Name: Close, dtype: float64

Note that the last date gets move to 2004-01-30 because 2004-02-02 is also not in df.index and the last available date in df.index is 2004-01-30.

Upvotes: 3

chrisaycock
chrisaycock

Reputation: 37938

You could just query for any day-of-week that is less than the previous date:

weekdays = np.array(map(lambda x: x.weekday(), df.index))
df[weekdays < np.roll(weekdays, 1)]

This works because the weekday starts at 0 for Monday and increments from there. So the first day of the week will have a weekday that is less than the previous date.

Upvotes: 0

Related Questions