Reputation:
I'm trying to fill missing datavalues in a pandas dataframe based on date column.
df.head()
col1 col2 col3
date
2014-06-20 3 752 4028
2014-06-21 4 752 4028
2014-06-22 32 752 4028
2014-06-25 44 882 4548
2014-06-26 32 882 4548
I tried the following
idx = pd.date_range(df.index[0], df.index[-1])
df = df.reindex(idx).reset_index()
But, I get a dataframe of nans.
index col1 col2 col3
0 2014-06-20 NaN NaN NaN
1 2014-06-21 NaN NaN NaN
2 2014-06-22 NaN NaN NaN
3 2014-06-23 NaN NaN NaN
4 2014-06-24 NaN NaN NaN
What am I missing here ?
Upvotes: 0
Views: 427
Reputation: 880707
The behavior you describe would happen if the index is a pd.Index
containing
strings, rather than a pd.DatetimeIndex
containing timestamps.
For example,
import pandas as pd
df = pd.DataFrame(
{'col1': [3, 4, 32, 44, 32],
'col2': [752, 752, 752, 882, 882],
'col3': [4028, 4028, 4028, 4548, 4548]},
index = ['2014-06-20', '2014-06-21', '2014-06-22', '2014-06-25', '2014-06-26'])
idx = pd.date_range(df.index[0], df.index[-1])
print(df.reindex(idx).reset_index())
# index col1 col2 col3
# 0 2014-06-20 NaN NaN NaN
# 1 2014-06-21 NaN NaN NaN
# 2 2014-06-22 NaN NaN NaN
# 3 2014-06-23 NaN NaN NaN
# 4 2014-06-24 NaN NaN NaN
# 5 2014-06-25 NaN NaN NaN
# 6 2014-06-26 NaN NaN NaN
whereas, in contrast, if you make the index a DatetimeIndex:
df.index = pd.DatetimeIndex(df.index)
then
print(df.reindex(idx).reset_index())
index col1 col2 col3
0 2014-06-20 3 752 4028
1 2014-06-21 4 752 4028
2 2014-06-22 32 752 4028
3 2014-06-23 NaN NaN NaN
4 2014-06-24 NaN NaN NaN
5 2014-06-25 44 882 4548
6 2014-06-26 32 882 4548
Upvotes: 2
Reputation: 2173
Pandas has a builtin method to achieve this. Have a look at http://pandas.pydata.org/pandas-docs/stable/timeseries.html .
You can use df.asfreq('1d')
to resample your data based on the date column and fill in the missing values automatically.
Upvotes: 0