Brad Solomon
Brad Solomon

Reputation: 40878

Indexing & applying func to pandas DataFrame with a DateOffset

Take the following DataFrame of normally distributed terms with a small drift:

np.random.seed(123)
df = pd.DataFrame(np.random.randn(60,3) / 100 + 0.005, 
                  index=pd.date_range(end='2017-06-30', periods=60, freq='M'))

and define a function rollup that will be applied to the columns:

def rollup(r):
    return r.add(1.).prod() -1.

For example, calling on the entire df gives me:

print(rollup(df))
0    0.17411
1    0.35658
2    0.24944
dtype: float64

But what I'd like to do is take a date offset from the final date in the index of df, and apply rollup to that sub-frame. I'm getting to this correctly, as below, but wondering if there is an alternative method using a few less lines.

from pandas.tseries.offsets import DateOffset
end = df.index[-1]

start = end - DateOffset(years=2)    
print(df[start:end].apply(rollup))
0    0.07905
1    0.18037
2    0.09656
dtype: float64

# example 2
start = end - DateOffset(months=6)    
print(df[start:end].apply(rollup))
0    0.01656
1    0.06585
2    0.01463
dtype: float64

Can this final bit of code be condensed? Is there another method within Time Series / Date functionality that doesn't require me to specify end, apply a DateOffset, and then index df between the two?

If this is the most straightforward method requiring the least code, that to me is an answer in itself.

Upvotes: 1

Views: 111

Answers (1)

piRSquared
piRSquared

Reputation: 294288

To start, I'd say your code is pretty concise. I'll make this observation and suggestion:

Your index was created with a frequency of 'M' and that carries over to each individual Timestamp. What that means is that those objects now know how to process addition and subtraction of integers. What pandas does is assumes the integer takes on the offset represented by the frequency.

Example

df.index[-1]

Timestamp('2017-06-30 00:00:00', freq='M')

And

df.index[-1] - 2

Timestamp('2017-04-30 00:00:00', freq='M')

We can use this to accomplish your goals with

rollup(df[df.index[-1] - 2:])  # last 2 months

And

rollup(df[df.index[-1] - 24:])  # last 2 years

Extra Credit

This math also works on the entire index.

df.index + 2

Adds two months to each index value.

Upvotes: 1

Related Questions