Bappa
Bappa

Reputation: 81

Missing Date Values

I have a table of entries as listed below. The table has missing date entries which I want to populate with end of the month missing date values. For the 1st columns, the added rows of missing date entries should be populated with the same values and for value columns I am expecting them to be populated with 0.

FirstName   MiddleName  LastName    Date    Value1  Value2  Value3
first1  middle1 last1   1/31/2020   51  80  19
first1  middle1 last1   2/29/2020   14  44  56
first1  middle1 last1   4/30/2020   57  96  40
first1  middle1 last1   6/30/2020   58  65  3
first1  middle1 last1   8/31/2020   1   34  4
first1  middle1 last1   10/31/2020  40  38  53
first1  middle1 last1   12/31/2020  93  65  41
first1  middle1 last1   2/28/2021   3   43  0
first1  middle1 last1   4/30/2021   46  61  52
first2  middle2 last2   1/31/2020   64  19  33
first2  middle2 last2   2/29/2020   28  71  16
first2  middle2 last2   4/30/2020   2   94  78
first2  middle2 last2   5/31/2020   78  99  87
first2  middle2 last2   6/30/2020   10  70  14
first2  middle2 last2   7/31/2020   30  30  59
first2  middle2 last2   8/31/2020   55  96  73
first2  middle2 last2   10/31/2020  22  43  23
first2  middle2 last2   11/30/2020  12  4   84
first2  middle2 last2   1/31/2021   59  93  1
first2  middle2 last2   2/28/2021   19  33  52
first2  middle2 last2   3/31/2021   46  12  97
first2  middle2 last2   4/30/2021   41  44  59
first2  middle2 last2   5/31/2021   67  84  96
first2  middle2 last2   6/30/2021   52  69  78
first3  middle3 last3   4/30/2020   5   63  30
first3  middle3 last3   5/31/2020   45  22  7
first3  middle3 last3   6/30/2020   76  2   33
first3  middle3 last3   8/31/2020   81  25  52
first3  middle3 last3   9/30/2020   55  3   32
first3  middle3 last3   11/30/2020  46  45  80
first3  middle3 last3   12/31/2020  17  81  74
first3  middle3 last3   1/31/2021   98  6   55

'''

Expected Output

FirstName   MiddleName  LastName    Date    Value1  Value2  Value3
first1  middle1 last1   1/31/2020   51  80  19
first1  middle1 last1   2/29/2020   14  44  56
first1  middle1 last1   3/31/2020   0   0   0
first1  middle1 last1   4/30/2020   57  96  40
first1  middle1 last1   5/31/2020   0   0   0
first1  middle1 last1   6/30/2020   58  65  3
first1  middle1 last1   7/31/2020   0   0   0
first1  middle1 last1   8/31/2020   1   34  4
first1  middle1 last1   9/30/2020   0   0   0
first1  middle1 last1   10/31/2020  40  38  53
first1  middle1 last1   11/30/2020  0   0   0
first1  middle1 last1   12/31/2020  93  65  41
first1  middle1 last1   1/31/2021   0   0   0
first1  middle1 last1   2/28/2021   3   43  0
first1  middle1 last1   3/31/2021   0   0   0
first1  middle1 last1   4/30/2021   46  61  52
first2  middle2 last2   1/31/2020   64  19  33
first2  middle2 last2   2/29/2020   28  71  16
first2  middle2 last2   3/31/2020   0   0   0
first2  middle2 last2   4/30/2020   2   94  78
first2  middle2 last2   5/31/2020   78  99  87
first2  middle2 last2   6/30/2020   10  70  14
first2  middle2 last2   7/31/2020   30  30  59
first2  middle2 last2   8/31/2020   55  96  73
first2  middle2 last2   9/30/2020   0   0   0
first2  middle2 last2   10/31/2020  22  43  23
first2  middle2 last2   11/30/2020  12  4   84
first2  middle2 last2   12/31/2020  0   0   0
first2  middle2 last2   1/31/2021   59  93  1
first2  middle2 last2   2/28/2021   19  33  52
first2  middle2 last2   3/31/2021   46  12  97
first2  middle2 last2   4/30/2021   41  44  59
first2  middle2 last2   5/31/2021   67  84  96
first2  middle2 last2   6/30/2021   52  69  78
first3  middle3 last3   4/30/2020   5   63  30
first3  middle3 last3   5/31/2020   45  22  7
first3  middle3 last3   6/30/2020   76  2   33
first3  middle3 last3   7/31/2020   0   0   0
first3  middle3 last3   8/31/2020   81  25  52
first3  middle3 last3   9/30/2020   55  3   32
first3  middle3 last3   10/31/2020  0   0   0
first3  middle3 last3   11/30/2020  46  45  80
first3  middle3 last3   12/31/2020  17  81  74
first3  middle3 last3   1/31/2021   98  6   55

I have tried to use the resample with the dataframe but I am not getting the desired output. df = df.set_index('Date').resample('M').ffill().reset_index()

Upvotes: 1

Views: 78

Answers (1)

David Erickson
David Erickson

Reputation: 16683

EDIT: There is a much better solution than the second one below:

You were close on your logic. You can just include .groupby() before .resample and .asfreq() after.

df['Date'] = pd.to_datetime(df['Date'])
cols = ['FirstName','MiddleName','LastName']
df = (df.set_index('Date')
        .groupby(cols)
        .resample('M')
        .asfreq(fill_value=0)
        .drop(cols, axis=1)
        .reset_index())
df
Out[128]: 
   FirstName MiddleName LastName       Date  Value1  Value2  Value3
0     first1    middle1    last1 2020-01-31      51      80      19
1     first1    middle1    last1 2020-02-29      14      44      56
2     first1    middle1    last1 2020-03-31       0       0       0
3     first1    middle1    last1 2020-04-30      57      96      40
4     first1    middle1    last1 2020-05-31       0       0       0
5     first1    middle1    last1 2020-06-30      58      65       3
6     first1    middle1    last1 2020-07-31       0       0       0
7     first1    middle1    last1 2020-08-31       1      34       4
8     first1    middle1    last1 2020-09-30       0       0       0
9     first1    middle1    last1 2020-10-31      40      38      53
10    first1    middle1    last1 2020-11-30       0       0       0
11    first1    middle1    last1 2020-12-31      93      65      41
12    first1    middle1    last1 2021-01-31       0       0       0
13    first1    middle1    last1 2021-02-28       3      43       0
14    first1    middle1    last1 2021-03-31       0       0       0
15    first1    middle1    last1 2021-04-30      46      61      52
16    first2    middle2    last2 2020-01-31      64      19      33
17    first2    middle2    last2 2020-02-29      28      71      16
18    first2    middle2    last2 2020-03-31       0       0       0
19    first2    middle2    last2 2020-04-30       2      94      78
20    first2    middle2    last2 2020-05-31      78      99      87
21    first2    middle2    last2 2020-06-30      10      70      14
22    first2    middle2    last2 2020-07-31      30      30      59
23    first2    middle2    last2 2020-08-31      55      96      73
24    first2    middle2    last2 2020-09-30       0       0       0
25    first2    middle2    last2 2020-10-31      22      43      23
26    first2    middle2    last2 2020-11-30      12       4      84
27    first2    middle2    last2 2020-12-31       0       0       0
28    first2    middle2    last2 2021-01-31      59      93       1
29    first2    middle2    last2 2021-02-28      19      33      52
30    first2    middle2    last2 2021-03-31      46      12      97
31    first2    middle2    last2 2021-04-30      41      44      59
32    first2    middle2    last2 2021-05-31      67      84      96
33    first2    middle2    last2 2021-06-30      52      69      78
34    first3    middle3    last3 2020-04-30       5      63      30
35    first3    middle3    last3 2020-05-31      45      22       7
36    first3    middle3    last3 2020-06-30      76       2      33
37    first3    middle3    last3 2020-07-31       0       0       0
38    first3    middle3    last3 2020-08-31      81      25      52
39    first3    middle3    last3 2020-09-30      55       3      32
40    first3    middle3    last3 2020-10-31       0       0       0
41    first3    middle3    last3 2020-11-30      46      45      80
42    first3    middle3    last3 2020-12-31      17      81      74
43    first3    middle3    last3 2021-01-31      98       6      55

Method #2 with reindex (more complicated)

It gets a little more complicated, because you need to fill values within a group. I'm not sure if there is a simpler way than the following solution:

  1. You can create a date_range() with freq='M' that will contain missing end of month date values. You will later use this to reindex.
  2. You are reindexing within groups, so you can create a unique identifier for the group called FullName. This new column will also be passed when reindexing, so that you reindex within the group.
  3. Create the MultiIndex made of #1 and #2 above by utilizing pd.MultiIndex.from_product()
  4. Use set_index() and set FirstName and Date as the index. Next you will .reindex the dataframe passing the mmultiindex we created in #3.
  5. Clean up the dataframe, so that you fill FirstName, MiddleName, and LastName with the correct value by using .groupby, and .transform(max)
  6. Finally, clean up the order of the columns, excluding the temporary column I created earlier, FullName.

You need to use pd.MultiIndex.from_product() and reindex by FirstName and a date_range of dates called s. This is the crux of it that will allow you to set_index and reindex to fill in the missing dates.

import pandas as pd
df = pd.read_clipboard()
df['Date'] = pd.to_datetime(df['Date'])
s = pd.date_range(df['Date'].min(), df['Date'].max(), freq='M')
df['FullName'] = df['FirstName'] + df['MiddleName'] + df['LastName']
idx = pd.MultiIndex.from_product([df['FullName'].unique(), s], names=['FullName', 'Date'])
df = df.set_index(['FullName', 'Date']).reindex(idx, fill_value='0').reset_index()
df['FirstName'] = df.groupby('FullName')['FirstName'].transform('max')
df['MiddleName'] = df.groupby('FullName')['MiddleName'].transform('max')
df['LastName'] = df.groupby('FullName')['LastName'].transform('max')
df = df[['FirstName', 'MiddleName', 'LastName', 'Date', 'Value1', 'Value2',
       'Value3']].replace('0',0)
df
Out[98]: 
   FirstName MiddleName LastName       Date  Value1  Value2  Value3
0     first1    middle1    last1 2020-01-31      51      80      19
1     first1    middle1    last1 2020-02-29      14      44      56
2     first1    middle1    last1 2020-03-31       0       0       0
3     first1    middle1    last1 2020-04-30      57      96      40
4     first1    middle1    last1 2020-05-31       0       0       0
5     first1    middle1    last1 2020-06-30      58      65       3
6     first1    middle1    last1 2020-07-31       0       0       0
7     first1    middle1    last1 2020-08-31       1      34       4
8     first1    middle1    last1 2020-09-30       0       0       0
9     first1    middle1    last1 2020-10-31      40      38      53
10    first1    middle1    last1 2020-11-30       0       0       0
11    first1    middle1    last1 2020-12-31      93      65      41
12    first1    middle1    last1 2021-01-31       0       0       0
13    first1    middle1    last1 2021-02-28       3      43       0
14    first1    middle1    last1 2021-03-31       0       0       0
15    first1    middle1    last1 2021-04-30      46      61      52
16    first1    middle1    last1 2021-05-31       0       0       0
17    first1    middle1    last1 2021-06-30       0       0       0
18    first2    middle2    last2 2020-01-31      64      19      33
19    first2    middle2    last2 2020-02-29      28      71      16
20    first2    middle2    last2 2020-03-31       0       0       0
21    first2    middle2    last2 2020-04-30       2      94      78
22    first2    middle2    last2 2020-05-31      78      99      87
23    first2    middle2    last2 2020-06-30      10      70      14
24    first2    middle2    last2 2020-07-31      30      30      59
25    first2    middle2    last2 2020-08-31      55      96      73
26    first2    middle2    last2 2020-09-30       0       0       0
27    first2    middle2    last2 2020-10-31      22      43      23
28    first2    middle2    last2 2020-11-30      12       4      84
29    first2    middle2    last2 2020-12-31       0       0       0
30    first2    middle2    last2 2021-01-31      59      93       1
31    first2    middle2    last2 2021-02-28      19      33      52
32    first2    middle2    last2 2021-03-31      46      12      97
33    first2    middle2    last2 2021-04-30      41      44      59
34    first2    middle2    last2 2021-05-31      67      84      96
35    first2    middle2    last2 2021-06-30      52      69      78
36    first3    middle3    last3 2020-01-31       0       0       0
37    first3    middle3    last3 2020-02-29       0       0       0
38    first3    middle3    last3 2020-03-31       0       0       0
39    first3    middle3    last3 2020-04-30       5      63      30
40    first3    middle3    last3 2020-05-31      45      22       7
41    first3    middle3    last3 2020-06-30      76       2      33
42    first3    middle3    last3 2020-07-31       0       0       0
43    first3    middle3    last3 2020-08-31      81      25      52
44    first3    middle3    last3 2020-09-30      55       3      32
45    first3    middle3    last3 2020-10-31       0       0       0
46    first3    middle3    last3 2020-11-30      46      45      80
47    first3    middle3    last3 2020-12-31      17      81      74
48    first3    middle3    last3 2021-01-31      98       6      55
49    first3    middle3    last3 2021-02-28       0       0       0
50    first3    middle3    last3 2021-03-31       0       0       0
51    first3    middle3    last3 2021-04-30       0       0       0
52    first3    middle3    last3 2021-05-31       0       0       0
53    first3    middle3    last3 2021-06-30       0       0       0

Upvotes: 2

Related Questions