Resample withing a specified interval in Pandas?

Question

Update: I have updated my example to clarify my question a bit
I have a data frame with a date index and a value, like:

            date    | value |        
         ------------+-------|
 category 
    A     2016-01-04 |     6 | 
          2016-01-05 |     4 | 
          2016-01-07 |    16 | 
          2016-01-10 |     0 | 
          2016-01-12 |     7 | 
    B     2016-01-20 |     0 | 
          2016-01-22 |     8 | 
          2016-01-29 |     5 | 
          2016-01-30 |     4 |

I want to resample the data frame but between a specific interval say '2016-01-01' to '2016-01-31' for each category and fill all Na values with a zero.

This is what i want:

             date    | value |        
         ------------+-------|
 Category 
     A    2016-01-01 |     0 | 
          2016-01-02 |     0 | 
          2016-01-03 |     0 | 
          2016-01-04 |     6 | 
    .....
          2016-01-29 |     0 | 
          2016-01-30 |     0 | 
          2016-01-31 |     0 | 
     B    2016-01-01 |     0 | 
          2016-01-02 |     0 | 
          2016-01-03 |     0 | 
          2016-01-04 |     0 | 
    .....

          2016-01-29 |     5 | 
          2016-01-30 |     4 | 
          2016-01-31 |     0 |

I can make it work by this: (however its very slow. Is there a better way?) Let the above table be stored in a data frame called data

idx = pd.DatetimeIndex(start='2016-1-1', end='2016-1-31', freq='D')
df=pd.DataFrame(columns=['category','date','value'])
data.reset_index('category',inplace=True)
for a,b in data.groupby(['category']):
b.set_index('date',inplace=True)
b.reindex(idx,fill_value=0)
b.reset_index('date')
df=pd.concat([df,b])

This also wastes a lot of memory. I do not want to create the date index first and then add values. Is there a way to transform the above using resample.

IanS · Accepted Answer

A solution would be to create a dataframe with the full date range, filled with 0s:

df2 = pd.DataFrame(0, index=pd.date_range('2016-01-01', '2016-01-31'), columns=['values'])

Then use combine_first to fill it with original values when available:

df.combine_first(df2)

Output:

            values
2016-01-01       0
2016-01-02       0
2016-01-03       0
2016-01-04       6
2016-01-05       4
2016-01-06       0
2016-01-07      16

Resample withing a specified interval in Pandas?

Answers (2)

Related Questions