Reputation: 1392
Update: I have updated my example to clarify my question a bit
I have a data frame with a date index and a value, like:
date | value |
------------+-------|
category
A 2016-01-04 | 6 |
2016-01-05 | 4 |
2016-01-07 | 16 |
2016-01-10 | 0 |
2016-01-12 | 7 |
B 2016-01-20 | 0 |
2016-01-22 | 8 |
2016-01-29 | 5 |
2016-01-30 | 4 |
I want to resample the data frame but between a specific interval say '2016-01-01' to '2016-01-31' for each category and fill all Na values with a zero.
This is what i want:
date | value |
------------+-------|
Category
A 2016-01-01 | 0 |
2016-01-02 | 0 |
2016-01-03 | 0 |
2016-01-04 | 6 |
.....
2016-01-29 | 0 |
2016-01-30 | 0 |
2016-01-31 | 0 |
B 2016-01-01 | 0 |
2016-01-02 | 0 |
2016-01-03 | 0 |
2016-01-04 | 0 |
.....
2016-01-29 | 5 |
2016-01-30 | 4 |
2016-01-31 | 0 |
I can make it work by this: (however its very slow. Is there a better way?) Let the above table be stored in a data frame called data
idx = pd.DatetimeIndex(start='2016-1-1', end='2016-1-31', freq='D')
df=pd.DataFrame(columns=['category','date','value'])
data.reset_index('category',inplace=True)
for a,b in data.groupby(['category']):
b.set_index('date',inplace=True)
b.reindex(idx,fill_value=0)
b.reset_index('date')
df=pd.concat([df,b])
This also wastes a lot of memory. I do not want to create the date index first and then add values. Is there a way to transform the above using resample.
Upvotes: 3
Views: 2422
Reputation: 109528
I believe you want to specify your start and end dates and then reindex your resampled data (Pandas 0.17+).
# Sample data.
df = pd.DataFrame({'a': range(5), 'b': range(5)}, index=pd.DatetimeIndex(start='2016-1-1', periods=5, freq='D'))
idx = pd.DatetimeIndex(start='2016-1-1', end='2016-2-29', freq='D') # freq='B' for weekdays.
df.reindex(idx, fill_value=0).head(8)
a b
2016-01-01 0 0
2016-01-02 1 1
2016-01-03 2 2
2016-01-04 3 3
2016-01-05 4 4
2016-01-06 0 0
2016-01-07 0 0
2016-01-08 0 0
Upvotes: 3
Reputation: 16241
A solution would be to create a dataframe with the full date range, filled with 0s:
df2 = pd.DataFrame(0, index=pd.date_range('2016-01-01', '2016-01-31'), columns=['values'])
Then use combine_first
to fill it with original values when available:
df.combine_first(df2)
Output:
values
2016-01-01 0
2016-01-02 0
2016-01-03 0
2016-01-04 6
2016-01-05 4
2016-01-06 0
2016-01-07 16
Upvotes: 1