spcol
spcol

Reputation: 433

Infill missing data with replacement data

I have a timeseries dataframe in daily granularity for 5 columns

            A  B  C  D  E 
31/08/2021  1  4  3  8  9
01/09/2021  8  9  3  1  0
.
.
.
13/09/2021 8 9 0 9 3

And I have a table of monthly 'normal' values;

   A  B  C  D  E
1  8  3  3  3  1
2  4  5  6  4  6
3  6  4  6  4  2
.
.
.
12 4 6 6  6 4

Essentially the raw data I'm using is messy and I need a way of infilling any gaps with its respective 'normal' value when data is missing.

So if for example there was no data for 21/03/2021 in the B column it would infill '4' as the normal value for B in March from the table.

Really struggling with this so any help is much appreciated!

Upvotes: 0

Views: 91

Answers (2)

Miquel Escobar
Miquel Escobar

Reputation: 380

You can do the following, using pd.to_datetime and fill_na functions:

def get_month(date):
    date = pd.to_datetime(date)
    return date.month

daily_df.apply(axis=1,  lambda row: row.fill_na(monthly_df[get_month(row.index)]))

Upvotes: 0

jezrael
jezrael

Reputation: 862471

Use DataFrame.asfreq for add missing datetimes and then replace missing values with convert datetimes to months, last convert to original datetimes:

df1.index = pd.to_datetime(df1.index, dayfirst=True)

df1 = df1.asfreq('d')
df1 = df1.set_index(df1.index.month).fillna(df2).set_index(df1.index)

Changed sample data for test:

print (df1)
            A  B  C  D  E
30/07/2021  1  4  3  8  9
01/09/2021  8  9  3  1  0
13/09/2021  8  9  0  9  3

print (df2)
   A  B  C  D  E
1  8  3  3  3  1
7  4  5  6  4  6
8  6  4  6  4  2
9  4  6  6  6  4
    
df1.index = pd.to_datetime(df1.index, dayfirst=True)

df1 = df1.asfreq('d')
df1 = df1.set_index(df1.index.month).fillna(df2).set_index(df1.index)

print (df1)
              A    B    C    D    E
2021-07-30  1.0  4.0  3.0  8.0  9.0
2021-07-31  4.0  5.0  6.0  4.0  6.0
2021-08-01  6.0  4.0  6.0  4.0  2.0
2021-08-02  6.0  4.0  6.0  4.0  2.0
2021-08-03  6.0  4.0  6.0  4.0  2.0
2021-08-04  6.0  4.0  6.0  4.0  2.0
2021-08-05  6.0  4.0  6.0  4.0  2.0
2021-08-06  6.0  4.0  6.0  4.0  2.0
2021-08-07  6.0  4.0  6.0  4.0  2.0
2021-08-08  6.0  4.0  6.0  4.0  2.0
2021-08-09  6.0  4.0  6.0  4.0  2.0
2021-08-10  6.0  4.0  6.0  4.0  2.0
2021-08-11  6.0  4.0  6.0  4.0  2.0
2021-08-12  6.0  4.0  6.0  4.0  2.0
2021-08-13  6.0  4.0  6.0  4.0  2.0
2021-08-14  6.0  4.0  6.0  4.0  2.0
2021-08-15  6.0  4.0  6.0  4.0  2.0
2021-08-16  6.0  4.0  6.0  4.0  2.0
2021-08-17  6.0  4.0  6.0  4.0  2.0
2021-08-18  6.0  4.0  6.0  4.0  2.0
2021-08-19  6.0  4.0  6.0  4.0  2.0
2021-08-20  6.0  4.0  6.0  4.0  2.0
2021-08-21  6.0  4.0  6.0  4.0  2.0
2021-08-22  6.0  4.0  6.0  4.0  2.0
2021-08-23  6.0  4.0  6.0  4.0  2.0
2021-08-24  6.0  4.0  6.0  4.0  2.0
2021-08-25  6.0  4.0  6.0  4.0  2.0
2021-08-26  6.0  4.0  6.0  4.0  2.0
2021-08-27  6.0  4.0  6.0  4.0  2.0
2021-08-28  6.0  4.0  6.0  4.0  2.0
2021-08-29  6.0  4.0  6.0  4.0  2.0
2021-08-30  6.0  4.0  6.0  4.0  2.0
2021-08-31  6.0  4.0  6.0  4.0  2.0
2021-09-01  8.0  9.0  3.0  1.0  0.0
2021-09-02  4.0  6.0  6.0  6.0  4.0
2021-09-03  4.0  6.0  6.0  6.0  4.0
2021-09-04  4.0  6.0  6.0  6.0  4.0
2021-09-05  4.0  6.0  6.0  6.0  4.0
2021-09-06  4.0  6.0  6.0  6.0  4.0
2021-09-07  4.0  6.0  6.0  6.0  4.0
2021-09-08  4.0  6.0  6.0  6.0  4.0
2021-09-09  4.0  6.0  6.0  6.0  4.0
2021-09-10  4.0  6.0  6.0  6.0  4.0
2021-09-11  4.0  6.0  6.0  6.0  4.0
2021-09-12  4.0  6.0  6.0  6.0  4.0
2021-09-13  8.0  9.0  0.0  9.0  3.0

Upvotes: 1

Related Questions