kevingduck
kevingduck

Reputation: 531

(Pandas) Working with monthly data where some months are missing

I want to be able to calculate sales forecasts using the last 12 months worth of data for each of the products I sell.

I have time series data with a product name, a month, and a number of purchases in that month. However, some months where there were no sales, there is no data for that month.

My dataframe looks like this:

2014-06  product1  100
2014-07  product1  50
2014-10  product1  120

But I want it to look like this:

2014-06  product1  100
2014-07  product1  50
2014-08  product1  
2014-09  product1  
2014-10  product1  120

with a row for every month, not just the months with data. What's the most effective way to add the rows where the given month has no sales data?

Upvotes: 0

Views: 311

Answers (1)

jme
jme

Reputation: 20695

You can use Dataframe.reindex after constructing a new PeriodIndex with the missing months, using pd.date_range and to_period(). First, I'll recreate your data, converting your months to instances of Period:

index = pd.to_datetime(['2014-06', '2014-07', '2014-10']).to_period('M')
data = pd.DataFrame({
        'name': 'product1',
        'count': [100, 50, 120]
    }, index=index)

Now we create a new index with all of the months in the range:

new_index = pd.date_range(
    start=index[0].to_timestamp(how='end'),
    end=index[-1].to_timestamp(how='end'),
    freq='M').to_period()

This looks like:

>>> new_index
PeriodIndex(['2014-06', '2014-07', '2014-08', '2014-09', '2014-10'],
            dtype='int64', freq='M')

Such that:

>>> res = data.reindex(new_index, method='backfill')
>>> res

         count      name
2014-06    100  product1
2014-07     50  product1
2014-08    120  product1
2014-09    120  product1
2014-10    120  product1

You'll notice that both the name and count have been backfilled, whereas you only wanted the name to be backfilled. We can set the count for the new rows to be NaN as follows:

ix = new_index.difference(index)
res.loc[ix, 'count'] = None

So that:

>>> res

         count      name
2014-06    100  product1
2014-07     50  product1
2014-08    NaN  product1
2014-09    NaN  product1
2014-10    120  product1

Upvotes: 1

Related Questions