Reputation: 81
I have a table of entries as listed below. The table has missing date entries which I want to populate with end of the month missing date values. For the 1st columns, the added rows of missing date entries should be populated with the same values and for value columns I am expecting them to be populated with 0.
FirstName MiddleName LastName Date Value1 Value2 Value3
first1 middle1 last1 1/31/2020 51 80 19
first1 middle1 last1 2/29/2020 14 44 56
first1 middle1 last1 4/30/2020 57 96 40
first1 middle1 last1 6/30/2020 58 65 3
first1 middle1 last1 8/31/2020 1 34 4
first1 middle1 last1 10/31/2020 40 38 53
first1 middle1 last1 12/31/2020 93 65 41
first1 middle1 last1 2/28/2021 3 43 0
first1 middle1 last1 4/30/2021 46 61 52
first2 middle2 last2 1/31/2020 64 19 33
first2 middle2 last2 2/29/2020 28 71 16
first2 middle2 last2 4/30/2020 2 94 78
first2 middle2 last2 5/31/2020 78 99 87
first2 middle2 last2 6/30/2020 10 70 14
first2 middle2 last2 7/31/2020 30 30 59
first2 middle2 last2 8/31/2020 55 96 73
first2 middle2 last2 10/31/2020 22 43 23
first2 middle2 last2 11/30/2020 12 4 84
first2 middle2 last2 1/31/2021 59 93 1
first2 middle2 last2 2/28/2021 19 33 52
first2 middle2 last2 3/31/2021 46 12 97
first2 middle2 last2 4/30/2021 41 44 59
first2 middle2 last2 5/31/2021 67 84 96
first2 middle2 last2 6/30/2021 52 69 78
first3 middle3 last3 4/30/2020 5 63 30
first3 middle3 last3 5/31/2020 45 22 7
first3 middle3 last3 6/30/2020 76 2 33
first3 middle3 last3 8/31/2020 81 25 52
first3 middle3 last3 9/30/2020 55 3 32
first3 middle3 last3 11/30/2020 46 45 80
first3 middle3 last3 12/31/2020 17 81 74
first3 middle3 last3 1/31/2021 98 6 55
'''
Expected Output
FirstName MiddleName LastName Date Value1 Value2 Value3
first1 middle1 last1 1/31/2020 51 80 19
first1 middle1 last1 2/29/2020 14 44 56
first1 middle1 last1 3/31/2020 0 0 0
first1 middle1 last1 4/30/2020 57 96 40
first1 middle1 last1 5/31/2020 0 0 0
first1 middle1 last1 6/30/2020 58 65 3
first1 middle1 last1 7/31/2020 0 0 0
first1 middle1 last1 8/31/2020 1 34 4
first1 middle1 last1 9/30/2020 0 0 0
first1 middle1 last1 10/31/2020 40 38 53
first1 middle1 last1 11/30/2020 0 0 0
first1 middle1 last1 12/31/2020 93 65 41
first1 middle1 last1 1/31/2021 0 0 0
first1 middle1 last1 2/28/2021 3 43 0
first1 middle1 last1 3/31/2021 0 0 0
first1 middle1 last1 4/30/2021 46 61 52
first2 middle2 last2 1/31/2020 64 19 33
first2 middle2 last2 2/29/2020 28 71 16
first2 middle2 last2 3/31/2020 0 0 0
first2 middle2 last2 4/30/2020 2 94 78
first2 middle2 last2 5/31/2020 78 99 87
first2 middle2 last2 6/30/2020 10 70 14
first2 middle2 last2 7/31/2020 30 30 59
first2 middle2 last2 8/31/2020 55 96 73
first2 middle2 last2 9/30/2020 0 0 0
first2 middle2 last2 10/31/2020 22 43 23
first2 middle2 last2 11/30/2020 12 4 84
first2 middle2 last2 12/31/2020 0 0 0
first2 middle2 last2 1/31/2021 59 93 1
first2 middle2 last2 2/28/2021 19 33 52
first2 middle2 last2 3/31/2021 46 12 97
first2 middle2 last2 4/30/2021 41 44 59
first2 middle2 last2 5/31/2021 67 84 96
first2 middle2 last2 6/30/2021 52 69 78
first3 middle3 last3 4/30/2020 5 63 30
first3 middle3 last3 5/31/2020 45 22 7
first3 middle3 last3 6/30/2020 76 2 33
first3 middle3 last3 7/31/2020 0 0 0
first3 middle3 last3 8/31/2020 81 25 52
first3 middle3 last3 9/30/2020 55 3 32
first3 middle3 last3 10/31/2020 0 0 0
first3 middle3 last3 11/30/2020 46 45 80
first3 middle3 last3 12/31/2020 17 81 74
first3 middle3 last3 1/31/2021 98 6 55
I have tried to use the resample with the dataframe but I am not getting the desired output. df = df.set_index('Date').resample('M').ffill().reset_index()
Upvotes: 1
Views: 78
Reputation: 16683
EDIT: There is a much better solution than the second one below:
You were close on your logic. You can just include .groupby()
before .resample
and .asfreq()
after.
df['Date'] = pd.to_datetime(df['Date'])
cols = ['FirstName','MiddleName','LastName']
df = (df.set_index('Date')
.groupby(cols)
.resample('M')
.asfreq(fill_value=0)
.drop(cols, axis=1)
.reset_index())
df
Out[128]:
FirstName MiddleName LastName Date Value1 Value2 Value3
0 first1 middle1 last1 2020-01-31 51 80 19
1 first1 middle1 last1 2020-02-29 14 44 56
2 first1 middle1 last1 2020-03-31 0 0 0
3 first1 middle1 last1 2020-04-30 57 96 40
4 first1 middle1 last1 2020-05-31 0 0 0
5 first1 middle1 last1 2020-06-30 58 65 3
6 first1 middle1 last1 2020-07-31 0 0 0
7 first1 middle1 last1 2020-08-31 1 34 4
8 first1 middle1 last1 2020-09-30 0 0 0
9 first1 middle1 last1 2020-10-31 40 38 53
10 first1 middle1 last1 2020-11-30 0 0 0
11 first1 middle1 last1 2020-12-31 93 65 41
12 first1 middle1 last1 2021-01-31 0 0 0
13 first1 middle1 last1 2021-02-28 3 43 0
14 first1 middle1 last1 2021-03-31 0 0 0
15 first1 middle1 last1 2021-04-30 46 61 52
16 first2 middle2 last2 2020-01-31 64 19 33
17 first2 middle2 last2 2020-02-29 28 71 16
18 first2 middle2 last2 2020-03-31 0 0 0
19 first2 middle2 last2 2020-04-30 2 94 78
20 first2 middle2 last2 2020-05-31 78 99 87
21 first2 middle2 last2 2020-06-30 10 70 14
22 first2 middle2 last2 2020-07-31 30 30 59
23 first2 middle2 last2 2020-08-31 55 96 73
24 first2 middle2 last2 2020-09-30 0 0 0
25 first2 middle2 last2 2020-10-31 22 43 23
26 first2 middle2 last2 2020-11-30 12 4 84
27 first2 middle2 last2 2020-12-31 0 0 0
28 first2 middle2 last2 2021-01-31 59 93 1
29 first2 middle2 last2 2021-02-28 19 33 52
30 first2 middle2 last2 2021-03-31 46 12 97
31 first2 middle2 last2 2021-04-30 41 44 59
32 first2 middle2 last2 2021-05-31 67 84 96
33 first2 middle2 last2 2021-06-30 52 69 78
34 first3 middle3 last3 2020-04-30 5 63 30
35 first3 middle3 last3 2020-05-31 45 22 7
36 first3 middle3 last3 2020-06-30 76 2 33
37 first3 middle3 last3 2020-07-31 0 0 0
38 first3 middle3 last3 2020-08-31 81 25 52
39 first3 middle3 last3 2020-09-30 55 3 32
40 first3 middle3 last3 2020-10-31 0 0 0
41 first3 middle3 last3 2020-11-30 46 45 80
42 first3 middle3 last3 2020-12-31 17 81 74
43 first3 middle3 last3 2021-01-31 98 6 55
Method #2 with reindex
(more complicated)
It gets a little more complicated, because you need to fill values within a group. I'm not sure if there is a simpler way than the following solution:
date_range()
with freq='M'
that will contain missing end of month date values. You will later use this to reindex.FullName
. This new column will also be passed when reindexing, so that you reindex within the group.MultiIndex
made of #1 and #2 above by utilizing pd.MultiIndex.from_product()
set_index()
and set FirstName
and Date
as the index. Next you will .reindex
the dataframe passing the mmultiindex we created in #3.FirstName
, MiddleName
, and LastName
with the correct value by using .groupby
, and .transform(max)
FullName
.You need to use pd.MultiIndex.from_product()
and reindex by FirstName
and a date_range
of dates called s
. This is the crux of it that will allow you to set_index
and reindex
to fill in the missing dates.
import pandas as pd
df = pd.read_clipboard()
df['Date'] = pd.to_datetime(df['Date'])
s = pd.date_range(df['Date'].min(), df['Date'].max(), freq='M')
df['FullName'] = df['FirstName'] + df['MiddleName'] + df['LastName']
idx = pd.MultiIndex.from_product([df['FullName'].unique(), s], names=['FullName', 'Date'])
df = df.set_index(['FullName', 'Date']).reindex(idx, fill_value='0').reset_index()
df['FirstName'] = df.groupby('FullName')['FirstName'].transform('max')
df['MiddleName'] = df.groupby('FullName')['MiddleName'].transform('max')
df['LastName'] = df.groupby('FullName')['LastName'].transform('max')
df = df[['FirstName', 'MiddleName', 'LastName', 'Date', 'Value1', 'Value2',
'Value3']].replace('0',0)
df
Out[98]:
FirstName MiddleName LastName Date Value1 Value2 Value3
0 first1 middle1 last1 2020-01-31 51 80 19
1 first1 middle1 last1 2020-02-29 14 44 56
2 first1 middle1 last1 2020-03-31 0 0 0
3 first1 middle1 last1 2020-04-30 57 96 40
4 first1 middle1 last1 2020-05-31 0 0 0
5 first1 middle1 last1 2020-06-30 58 65 3
6 first1 middle1 last1 2020-07-31 0 0 0
7 first1 middle1 last1 2020-08-31 1 34 4
8 first1 middle1 last1 2020-09-30 0 0 0
9 first1 middle1 last1 2020-10-31 40 38 53
10 first1 middle1 last1 2020-11-30 0 0 0
11 first1 middle1 last1 2020-12-31 93 65 41
12 first1 middle1 last1 2021-01-31 0 0 0
13 first1 middle1 last1 2021-02-28 3 43 0
14 first1 middle1 last1 2021-03-31 0 0 0
15 first1 middle1 last1 2021-04-30 46 61 52
16 first1 middle1 last1 2021-05-31 0 0 0
17 first1 middle1 last1 2021-06-30 0 0 0
18 first2 middle2 last2 2020-01-31 64 19 33
19 first2 middle2 last2 2020-02-29 28 71 16
20 first2 middle2 last2 2020-03-31 0 0 0
21 first2 middle2 last2 2020-04-30 2 94 78
22 first2 middle2 last2 2020-05-31 78 99 87
23 first2 middle2 last2 2020-06-30 10 70 14
24 first2 middle2 last2 2020-07-31 30 30 59
25 first2 middle2 last2 2020-08-31 55 96 73
26 first2 middle2 last2 2020-09-30 0 0 0
27 first2 middle2 last2 2020-10-31 22 43 23
28 first2 middle2 last2 2020-11-30 12 4 84
29 first2 middle2 last2 2020-12-31 0 0 0
30 first2 middle2 last2 2021-01-31 59 93 1
31 first2 middle2 last2 2021-02-28 19 33 52
32 first2 middle2 last2 2021-03-31 46 12 97
33 first2 middle2 last2 2021-04-30 41 44 59
34 first2 middle2 last2 2021-05-31 67 84 96
35 first2 middle2 last2 2021-06-30 52 69 78
36 first3 middle3 last3 2020-01-31 0 0 0
37 first3 middle3 last3 2020-02-29 0 0 0
38 first3 middle3 last3 2020-03-31 0 0 0
39 first3 middle3 last3 2020-04-30 5 63 30
40 first3 middle3 last3 2020-05-31 45 22 7
41 first3 middle3 last3 2020-06-30 76 2 33
42 first3 middle3 last3 2020-07-31 0 0 0
43 first3 middle3 last3 2020-08-31 81 25 52
44 first3 middle3 last3 2020-09-30 55 3 32
45 first3 middle3 last3 2020-10-31 0 0 0
46 first3 middle3 last3 2020-11-30 46 45 80
47 first3 middle3 last3 2020-12-31 17 81 74
48 first3 middle3 last3 2021-01-31 98 6 55
49 first3 middle3 last3 2021-02-28 0 0 0
50 first3 middle3 last3 2021-03-31 0 0 0
51 first3 middle3 last3 2021-04-30 0 0 0
52 first3 middle3 last3 2021-05-31 0 0 0
53 first3 middle3 last3 2021-06-30 0 0 0
Upvotes: 2