Reputation: 415
I have a list of dates and a dataframe. Now the dataframe has an id column and other values that are not consistent for all dates. I want to fill zeros in all columns for the ids and dates where there is no data. Let me show you by example:
date id clicks conv rev
2019-01-21 234 34 1 10
2019-01-21 235 32 0 0
2019-01-24 234 56 2 20
2019-01-23 235 23 3 30
date list is like this:
[2019-01-01, 2019-01-02,2019-01-03 ....2019-02-28]
What I want is to add zeros for all the missing dates in the dataframe for all ids. So the resultant df should look like:
date id clicks conv rev
2019-01-01 234 0 0 0
2019-01-01 235 0 0 0
. . . .
. . . .
2019-01-21 234 34 1 10
2019-01-21 235 32 0 0
2019-01-22 234 0 0 0
2019-01-22 235 0 0 0
2019-01-23 234 0 0 0
2019-01-23 235 0 0 0
2019-01-24 234 56 2 20
2019-01-23 235 23 3 30
. . . .
2019-02-28 0 0 0 0
Upvotes: 0
Views: 2590
Reputation: 59549
With set_index
+ reindex
from the cartesian product of values. Here I'll create the dates with pd.date_range
to save some typing, and ensure dates are datetime
import pandas as pd
df['date'] = pd.to_datetime(df.date)
my_dates = pd.date_range('2019-01-01', '2019-02-28', freq='D')
idx = pd.MultiIndex.from_product([my_dates, df.id.unique()], names=['date', 'id'])
df = df.set_index(['date', 'id']).reindex(idx).fillna(0).reset_index()
df
date id clicks conv rev
0 2019-01-01 234 0.0 0.0 0.0
1 2019-01-01 235 0.0 0.0 0.0
...
45 2019-01-23 235 23.0 3.0 30.0
46 2019-01-24 234 56.0 2.0 20.0
47 2019-01-24 235 0.0 0.0 0.0
...
115 2019-02-27 235 0.0 0.0 0.0
116 2019-02-28 234 0.0 0.0 0.0
117 2019-02-28 235 0.0 0.0 0.0
Upvotes: 2