codingknob
codingknob

Reputation: 11660

pandas re-indexing with missing dates

from dateutil.rrule import rrule, MONTHLY

def fread_year_month(strt_dt, end_dt):
        dates = [dt for dt in rrule(MONTHLY, dtstart=strt_dt, until=end_dt)]
        return dates

df = pd.DataFrame({
'value' : [4,2,5,6,7,8,6,5,4,1,2,4],
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 12, 1)),
'stock': ['amzn']*12
},columns=[
'value', 'date', 'stock'] )

df2 = pd.DataFrame({
'value' : [1,1,1,1,1],
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 5, 1)),
'stock': ['msft']*5
},columns=[
'value', 'date', 'stock'] )

df = df.append(df2)

df.set_index(['stock', 'date'], inplace=True)

I have the above pandas dataframe. As you can see the number of available data for amzn is not the same as msft. In this example the dates are sequential but it need not be the case (the dates can be any date).

If the universe of available dates are the dates for which I have data for AMZN then how can I add those exact dates for every other stock in my data frame with a NaN or NA.

In the example give, I want to insert the missing dates for msft in the index and insert NaN or NA for the value for those date indices.

Upvotes: 0

Views: 274

Answers (2)

Alexander
Alexander

Reputation: 109546

Before your merge, you can do this (as of Pandas 0.17):

>>> df2.set_index('date').reindex(df.date)
            value stock
date                   
2015-01-01      1  msft
2015-02-01      1  msft
2015-03-01      1  msft
2015-04-01      1  msft
2015-05-01      1  msft
2015-06-01    NaN   NaN
2015-07-01    NaN   NaN
2015-08-01    NaN   NaN
2015-09-01    NaN   NaN
2015-10-01    NaN   NaN
2015-11-01    NaN   NaN
2015-12-01    NaN   NaN

Upvotes: 1

chrisb
chrisb

Reputation: 52266

If you want to work with your tickers as columns, could do an unstack, like this:

In [71]: df.unstack(level=0)
Out[71]: 
           value     
stock       amzn msft
date                 
2015-01-01   4.0  1.0
2015-02-01   2.0  1.0
2015-03-01   5.0  1.0
2015-04-01   6.0  1.0
2015-05-01   7.0  1.0
2015-06-01   8.0  NaN
2015-07-01   6.0  NaN
2015-08-01   5.0  NaN
2015-09-01   4.0  NaN
2015-10-01   1.0  NaN
2015-11-01   2.0  NaN
2015-12-01   4.0  NaN

To reindex into the same shape, the from_product below creates a new MultiIndex with all the combinations of dates / tickers.

In [75]: df.reindex(pd.MultiIndex.from_product(df.index.levels))
Out[75]: 
                 value
amzn 2015-01-01    4.0
     2015-02-01    2.0
     2015-03-01    5.0
     2015-04-01    6.0
     2015-05-01    7.0
     2015-06-01    8.0
     2015-07-01    6.0
     2015-08-01    5.0
     2015-09-01    4.0
     2015-10-01    1.0
     2015-11-01    2.0
     2015-12-01    4.0
msft 2015-01-01    1.0
     2015-02-01    1.0
     2015-03-01    1.0
     2015-04-01    1.0
     2015-05-01    1.0
     2015-06-01    NaN
     2015-07-01    NaN
     2015-08-01    NaN
     2015-09-01    NaN
     2015-10-01    NaN
     2015-11-01    NaN
     2015-12-01    NaN

Upvotes: 1

Related Questions