Reputation: 294516
import pandas as pd
df = pd.DataFrame({'grp': [1, 2] * 2, 'value': range(4)},
index=pd.Index(pd.date_range('2016-03-01', periods=7)[::2], name='Date')
).sort_values('grp')
I wanted to group by 'grp'
and resample my index daily, forward filling missing values. I expected this to work:
print df.groupby('grp').resample('D').ffill()
grp value
Date
2016-03-01 1 0
2016-03-05 1 2
2016-03-03 2 1
2016-03-07 2 3
It did not. So I tried this:
print df.groupby('grp', group_keys=False).apply(lambda df: df.resample('D').ffill())
grp value
Date
2016-03-01 1 0
2016-03-02 1 0
2016-03-03 1 0
2016-03-04 1 0
2016-03-05 1 2
2016-03-03 2 1
2016-03-04 2 1
2016-03-05 2 1
2016-03-06 2 1
2016-03-07 2 3
It did work. Shouldn't these two methods have produced the same output? What am I missing?
Response to ayhan's comment
print sys.version
print pd.__version__
2.7.11 |Anaconda custom (x86_64)| (default, Dec 6 2015, 18:57:58)
[GCC 4.2.1 (Apple Inc. build 5577)]
0.18.0
ayhan showed that the results looked the same on python 3, pandas 18.1
After updating pandas to 18.1
2.7.11 |Anaconda custom (x86_64)| (default, Dec 6 2015, 18:57:58)
[GCC 4.2.1 (Apple Inc. build 5577)]
0.18.1
The issue has been resolved.
Upvotes: 1
Views: 177
Reputation:
It looks like one of the issues due to the changes in resample API in version 0.18.0.
It works as expected in 0.18.1:
df.groupby('grp').resample('D').ffill()
Out[2]:
grp value
grp Date
1 2016-03-01 1 0
2016-03-02 1 0
2016-03-03 1 0
2016-03-04 1 0
2016-03-05 1 2
2 2016-03-03 2 1
2016-03-04 2 1
2016-03-05 2 1
2016-03-06 2 1
2016-03-07 2 3
Upvotes: 2