user818489
user818489

Reputation:

Filling missing values pandas dataframe

I'm trying to fill missing datavalues in a pandas dataframe based on date column.

df.head()

            col1 col2 col3
date            
2014-06-20  3    752     4028
2014-06-21  4    752     4028
2014-06-22  32   752     4028
2014-06-25  44   882     4548
2014-06-26  32   882     4548

I tried the following

idx = pd.date_range(df.index[0], df.index[-1])

df = df.reindex(idx).reset_index()

But, I get a dataframe of nans.

    index       col1 col2   col3
0   2014-06-20  NaN  NaN    NaN
1   2014-06-21  NaN  NaN    NaN
2   2014-06-22  NaN  NaN    NaN
3   2014-06-23  NaN  NaN    NaN
4   2014-06-24  NaN  NaN    NaN

What am I missing here ?

Upvotes: 0

Views: 427

Answers (2)

unutbu
unutbu

Reputation: 880707

The behavior you describe would happen if the index is a pd.Index containing strings, rather than a pd.DatetimeIndex containing timestamps.

For example,

import pandas as pd

df = pd.DataFrame(
    {'col1': [3, 4, 32, 44, 32],
     'col2': [752, 752, 752, 882, 882],
     'col3': [4028, 4028, 4028, 4548, 4548]},
    index = ['2014-06-20', '2014-06-21', '2014-06-22', '2014-06-25', '2014-06-26'])

idx = pd.date_range(df.index[0], df.index[-1])
print(df.reindex(idx).reset_index())
#        index  col1  col2  col3
# 0 2014-06-20   NaN   NaN   NaN
# 1 2014-06-21   NaN   NaN   NaN
# 2 2014-06-22   NaN   NaN   NaN
# 3 2014-06-23   NaN   NaN   NaN
# 4 2014-06-24   NaN   NaN   NaN
# 5 2014-06-25   NaN   NaN   NaN
# 6 2014-06-26   NaN   NaN   NaN

whereas, in contrast, if you make the index a DatetimeIndex:

df.index = pd.DatetimeIndex(df.index)

then

print(df.reindex(idx).reset_index())
       index  col1  col2  col3
0 2014-06-20     3   752  4028
1 2014-06-21     4   752  4028
2 2014-06-22    32   752  4028
3 2014-06-23   NaN   NaN   NaN
4 2014-06-24   NaN   NaN   NaN
5 2014-06-25    44   882  4548
6 2014-06-26    32   882  4548

Upvotes: 2

lakshayg
lakshayg

Reputation: 2173

Pandas has a builtin method to achieve this. Have a look at http://pandas.pydata.org/pandas-docs/stable/timeseries.html .

You can use df.asfreq('1d') to resample your data based on the date column and fill in the missing values automatically.

Upvotes: 0

Related Questions