How to complement missing dates after groupby for each group in pandas?

Question

My goal is to complement the missing date entries per project_id with 0 in the data row.

For example

df = pd.DataFrame({
    'project_id': ['A', 'A', 'A', 'B', 'B'], 
    'timestamp': ['2018-01-01', '2018-03-01', '2018-04-01', '2018-03-01', '2018-06-01'], 
    'data': [100, 28, 45, 64, 55]})

which is

  project_id   timestamp  data
0          A  2018-01-01   100
1          A  2018-03-01    28
2          A  2018-04-01    45
3          B  2018-03-01    64
4          B  2018-06-01    55

shall become

  project_id   timestamp  data
0          A  2018-01-01   100
1          A  2018-02-01     0
2          A  2018-03-01    28
3          A  2018-04-01    45
4          B  2018-03-01    64
5          B  2018-04-01     0
6          B  2018-05-01     0
7          B  2018-06-01    55

where indices 1, 5, and 6 are added.

My current approach :

df.groupby('project_id').apply(lambda x: x[['timestamp', 'data']].set_index('timestamp').asfreq('M', how='start', fill_value=0))

is obviously wrong, because it sets everything to 0 and resampled not the first date of a month but the last one - although I thought this should be handled by how.

How do I expand/complement missing datetime entries after groupby to get a continuous time series for each group?

Quang Hoang · Accepted Answer

You are close:

df.timestamp = pd.to_datetime(df.timestamp)

# notice 'MS'
new_df = df.groupby('project_id').apply(lambda x: x[['timestamp', 'data']]
                                                    .set_index('timestamp').asfreq('MS'))

new_df.data = df.set_index(['project_id', 'timestamp']).data
df = new_df.fillna(0).reset_index()

How to complement missing dates after groupby for each group in pandas?

Answers (2)

Related Questions