Reputation: 67
So I'm going to try and explain this, but the outputs below make much more sense! I have a dataframe, df, and I want the whole date range to be the min and max of the Date column for each code. Just wondered whether this was possible? The desired output shown below I hope will explain this.
Output:
Code Count
Date
2021-01-31 H8101 15
2021-02-28 H8101 7
2021-03-31 H8101 19
2021-04-30 H8101 12
2021-05-31 H8101 20
2021-06-30 H8101 12
2021-07-31 H8101 7
2021-08-31 H8101 10
2021-09-30 H8101 16
2021-10-31 H8101 4
2021-03-31 H8162 1
2021-05-31 H8162 2
2021-09-30 H8162 1
2019-12-31 H8406 2
Desired Output:
Code Count
Date
2019-12-31 H8101 NaN
2020-01-31 H8101 NaN
2020-02-28 H8101 NaN
2020-03-31 H8101 NaN
2020-04-30 H8101 NaN
2020-05-31 H8101 NaN
2020-06-30 H8101 NaN
2020-07-31 H8101 NaN
2020-08-31 H8101 NaN
2020-09-30 H8101 NaN
2020-10-31 H8101 NaN
2020-11-30 H8101 NaN
2020-12-31 H8101 NaN
2021-01-31 H8101 15
2021-02-28 H8101 7
2021-03-31 H8101 19
2021-04-30 H8101 12
2021-05-31 H8101 20
2021-06-30 H8101 12
2021-07-31 H8101 7
2021-08-31 H8101 10
2021-09-30 H8101 16
2021-10-31 H8101 4
2019-12-31 H8162 NaN
2020-01-31 H8162 NaN
2020-02-28 H8162 NaN
2020-03-31 H8162 NaN
2020-04-30 H8162 NaN
2020-05-31 H8162 NaN
2020-06-30 H8162 NaN
2020-07-31 H8162 NaN
2020-08-31 H8162 NaN
2020-09-30 H8162 NaN
2020-10-31 H8162 NaN
2020-11-30 H8162 NaN
2020-12-31 H8162 NaN
2021-01-31 H8162 NaN
2021-02-28 H8162 NaN
2021-03-31 H8162 1
2021-04-30 H8162 NaN
2021-05-31 H8162 2
2021-06-30 H8162 NaN
2021-07-31 H8162 NaN
2021-08-31 H8162 NaN
2021-09-30 H8162 1
2021-10-31 H8162 NaN
2019-12-31 H8406 2
2020-01-31 H8406 NaN
2020-02-28 H8406 NaN
2020-03-31 H8406 NaN
2020-04-30 H8406 NaN
2020-05-31 H8406 NaN
2020-06-30 H8406 NaN
2020-07-31 H8406 NaN
2020-08-31 H8406 NaN
2020-09-30 H8406 NaN
2020-10-31 H8406 NaN
2020-11-30 H8406 NaN
2020-12-31 H8406 NaN
2021-01-31 H8406 NaN
2021-02-28 H8406 NaN
2021-03-31 H8406 NaN
2021-04-30 H8406 NaN
2021-05-31 H8406 NaN
2021-06-30 H8406 NaN
2021-07-31 H8406 NaN
2021-08-31 H8406 NaN
2021-09-30 H8406 NaN
2021-10-31 H8406 NaN
Upvotes: 0
Views: 36
Reputation: 3598
Given df
dataframe as shown in question, first create index idx
containing full range of daily dates:
idx = pd.date_range(df.Date.min(),df.Date.max(), name='Date')
Then create function reindexing dataframe using idx
:
def df_resample(df_):
return df_.set_index('Date').reindex(idx).resample('M').first()
Finally apply
above function to each group of Code
in dataframe df
df_result = df.groupby(['Code']).apply(df_resample).drop(columns='Code').reset_index()
The result is:
Code Date Count
0 H8101 2019-12-31
1 H8101 2020-01-31
2 H8101 2020-02-29
3 H8101 2020-03-31
4 H8101 2020-04-30
5 H8101 2020-05-31
6 H8101 2020-06-30
7 H8101 2020-07-31
8 H8101 2020-08-31
9 H8101 2020-09-30
10 H8101 2020-10-31
11 H8101 2020-11-30
12 H8101 2020-12-31
13 H8101 2021-01-31 15.0
14 H8101 2021-02-28 7.0
15 H8101 2021-03-31 19.0
16 H8101 2021-04-30 12.0
17 H8101 2021-05-31 20.0
18 H8101 2021-06-30 12.0
19 H8101 2021-07-31 7.0
20 H8101 2021-08-31 10.0
21 H8101 2021-09-30 16.0
22 H8101 2021-10-31 4.0
23 H8162 2019-12-31
24 H8162 2020-01-31
25 H8162 2020-02-29
26 H8162 2020-03-31
27 H8162 2020-04-30
28 H8162 2020-05-31
29 H8162 2020-06-30
30 H8162 2020-07-31
31 H8162 2020-08-31
32 H8162 2020-09-30
33 H8162 2020-10-31
34 H8162 2020-11-30
35 H8162 2020-12-31
36 H8162 2021-01-31
37 H8162 2021-02-28
38 H8162 2021-03-31 1.0
39 H8162 2021-04-30
40 H8162 2021-05-31 2.0
41 H8162 2021-06-30
42 H8162 2021-07-31
43 H8162 2021-08-31
44 H8162 2021-09-30 1.0
45 H8162 2021-10-31
46 H8406 2019-12-31 2.0
47 H8406 2020-01-31
48 H8406 2020-02-29
49 H8406 2020-03-31
50 H8406 2020-04-30
51 H8406 2020-05-31
52 H8406 2020-06-30
53 H8406 2020-07-31
54 H8406 2020-08-31
55 H8406 2020-09-30
56 H8406 2020-10-31
57 H8406 2020-11-30
58 H8406 2020-12-31
59 H8406 2021-01-31
60 H8406 2021-02-28
61 H8406 2021-03-31
62 H8406 2021-04-30
63 H8406 2021-05-31
64 H8406 2021-06-30
65 H8406 2021-07-31
66 H8406 2021-08-31
67 H8406 2021-09-30
68 H8406 2021-10-31
Upvotes: 1