Reputation: 189
I know the Pandas .resample()
method can down sample using a regular interval. What if I have irregular intervals? This code doesn't work. I don't want to reinvent the wheel if a function already exists.
d = {'ticker': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
'date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
'return': [.01, .011, .012, .013, .014, -.02, -.01, .01, .02, -.03]}
df = pd.DataFrame(d)
dates = ['2021-01-03', '2021-01-05']
df.resample(dates).sum()
# ValueError: Invalid frequency
What I want.
ticker | date | return |
---|---|---|
A | 2021-01-03 | 0.033 |
A | 2021-01-05 | 0.027 |
B | 2021-01-03 | -0.02 |
B | 2021-01-05 | -0.01 |
Upvotes: 0
Views: 217
Reputation: 15872
You can use the following:
>>> mask = df.date.where(df['date'].isin(dates)).bfill()
>>> df.groupby([df.ticker, mask]).sum().reset_index()
ticker date return
0 A 2021-01-03 0.033
1 A 2021-01-05 0.027
2 B 2021-01-03 -0.020
3 B 2021-01-05 -0.010
Upvotes: 1