Reputation:
let's say i have the following data
date id1 id2 category value
01/01/2019 1000 2000 income 1.0
01/01/2019 1000 2010 income 1.0
01/01/2019 1000 2000 expense 1.0
01/01/2019 1000 2010 expense 1.0
01/02/2019 1000 2000 income 2.0
01/02/2019 1000 2010 income 2.0
01/02/2019 1000 2000 expense 2.0
01/02/2019 1000 2010 expense 2.0
01/04/2019 1000 2000 income 3.0
01/04/2019 1000 2010 income 3.0
01/04/2019 1000 2000 expense 3.0
01/04/2019 1000 2010 expense 3.0
I would like to fill in the missing date 01/03/2019, but also a row for each combination of id1, id2 and category. so in my case, 4 rows would be added:
date id1 id2 category value
01/03/2019 1000 2000 income 2.0
01/03/2019 1000 2010 income 2.0
01/03/2019 1000 2000 expense 2.0
01/03/2019 1000 2010 expense 2.0
I am familiar with back filling and forward filling dates when that is the the only index, but the above particular problem of backfilling by the combination of values from multiple columns is giving me trouble. Any thoughts of an easy way to do this using pandas?
Upvotes: 1
Views: 122
Reputation: 323226
It is a pivot problem 1st , then become a resample
and ffill
problem
df.date=pd.to_datetime(df.date)
df['key']=df.groupby('date').cumcount()
newdf=df.set_index(['date','key']).unstack().resample('D').mean().ffill().stack().reset_index(level=0)
newdf
date id1 id2 value
key
0 2019-01-01 1000.0 2000.0 1.0
1 2019-01-01 1000.0 2010.0 1.0
2 2019-01-01 1000.0 2000.0 1.0
3 2019-01-01 1000.0 2010.0 1.0
0 2019-01-02 1000.0 2000.0 2.0
1 2019-01-02 1000.0 2010.0 2.0
2 2019-01-02 1000.0 2000.0 2.0
3 2019-01-02 1000.0 2010.0 2.0
0 2019-01-03 1000.0 2000.0 2.0
1 2019-01-03 1000.0 2010.0 2.0
2 2019-01-03 1000.0 2000.0 2.0
3 2019-01-03 1000.0 2010.0 2.0
0 2019-01-04 1000.0 2000.0 3.0
1 2019-01-04 1000.0 2010.0 3.0
2 2019-01-04 1000.0 2000.0 3.0
3 2019-01-04 1000.0 2010.0 3.0
Upvotes: 2