CoderBC
CoderBC

Reputation: 1392

Resample withing a specified interval in Pandas?

Update: I have updated my example to clarify my question a bit
I have a data frame with a date index and a value, like:

            date    | value |        
         ------------+-------|
 category 
    A     2016-01-04 |     6 | 
          2016-01-05 |     4 | 
          2016-01-07 |    16 | 
          2016-01-10 |     0 | 
          2016-01-12 |     7 | 
    B     2016-01-20 |     0 | 
          2016-01-22 |     8 | 
          2016-01-29 |     5 | 
          2016-01-30 |     4 | 

I want to resample the data frame but between a specific interval say '2016-01-01' to '2016-01-31' for each category and fill all Na values with a zero.

This is what i want:

             date    | value |        
         ------------+-------|
 Category 
     A    2016-01-01 |     0 | 
          2016-01-02 |     0 | 
          2016-01-03 |     0 | 
          2016-01-04 |     6 | 
    .....
          2016-01-29 |     0 | 
          2016-01-30 |     0 | 
          2016-01-31 |     0 | 
     B    2016-01-01 |     0 | 
          2016-01-02 |     0 | 
          2016-01-03 |     0 | 
          2016-01-04 |     0 | 
    .....

          2016-01-29 |     5 | 
          2016-01-30 |     4 | 
          2016-01-31 |     0 | 

I can make it work by this: (however its very slow. Is there a better way?) Let the above table be stored in a data frame called data

idx = pd.DatetimeIndex(start='2016-1-1', end='2016-1-31', freq='D')
df=pd.DataFrame(columns=['category','date','value'])
data.reset_index('category',inplace=True)
for a,b in data.groupby(['category']):
b.set_index('date',inplace=True)
b.reindex(idx,fill_value=0)
b.reset_index('date')
df=pd.concat([df,b])

This also wastes a lot of memory. I do not want to create the date index first and then add values. Is there a way to transform the above using resample.

Upvotes: 3

Views: 2422

Answers (2)

Alexander
Alexander

Reputation: 109528

I believe you want to specify your start and end dates and then reindex your resampled data (Pandas 0.17+).

# Sample data.
df = pd.DataFrame({'a': range(5), 'b': range(5)}, index=pd.DatetimeIndex(start='2016-1-1', periods=5, freq='D'))

idx = pd.DatetimeIndex(start='2016-1-1', end='2016-2-29', freq='D')  # freq='B' for weekdays.
df.reindex(idx, fill_value=0).head(8)
            a  b
2016-01-01  0  0
2016-01-02  1  1
2016-01-03  2  2
2016-01-04  3  3
2016-01-05  4  4
2016-01-06  0  0
2016-01-07  0  0
2016-01-08  0  0

Upvotes: 3

IanS
IanS

Reputation: 16241

A solution would be to create a dataframe with the full date range, filled with 0s:

df2 = pd.DataFrame(0, index=pd.date_range('2016-01-01', '2016-01-31'), columns=['values'])

Then use combine_first to fill it with original values when available:

df.combine_first(df2)

Output:

            values
2016-01-01       0
2016-01-02       0
2016-01-03       0
2016-01-04       6
2016-01-05       4
2016-01-06       0
2016-01-07      16

Upvotes: 1

Related Questions