qua
qua

Reputation: 992

fill missing indices in pandas

I have data like follows:

import pandas as pd
from datetime import datetime

x = pd.Series([1, 2, 4], [datetime(2013,11,1), datetime(2013,11, 2), datetime(2013, 11, 4)])

The missing index at November 3rd corresponds to a zero value, and I want it to look like this:

y = pd.Series([1,2,0,4], pd.date_range('2013-11-01', periods = 4))

What's the best way to convert x to y? I've tried

y = pd.Series(x, pd.date_range('2013-11-1', periods = 4)).fillna(0)

This throws an index error sometimes which I can't interpret (Index length did not match values, even though index and data have the same length. Is there a better way to do this?

Upvotes: 12

Views: 14513

Answers (2)

roman
roman

Reputation: 117345

You can use pandas.Series.resample() for this:

>>> x.resample('D').fillna(0)
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4

There's fill_method parameter in the resample() function, but I don't know if it's possible to use it to replace NaN during resampling. But looks like you can use how method to take care of it, like:

>>> x.resample('D', how=lambda x: x.mean() if len(x) > 0 else 0)
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4

Don't know which method is preferred one. Please also take a look at @AndyHayden's answer - probably reindex() with fill_value=0 would be most efficien way to do this, but you have to make your own tests.

Upvotes: 14

Andy Hayden
Andy Hayden

Reputation: 375445

I think I would use a resample (note if there are dupes it takes the mean by default):

In [11]: x.resample('D')  # you could use how='first'
Out[11]: 
2013-11-01     1
2013-11-02     2
2013-11-03   NaN
2013-11-04     4
Freq: D, dtype: float64

In [12]: x.resample('D').fillna(0)
Out[12]: 
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4
Freq: D, dtype: float64

If you prefered dupes to raise, then use reindex:

In [13]: x.reindex(pd.date_range('2013-11-1', periods=4), fill_value=0)
Out[13]: 
2013-11-01   1
2013-11-02   2
2013-11-03   0
2013-11-04   4
Freq: D, dtype: float64

Upvotes: 9

Related Questions