Upendra Dama
Upendra Dama

Reputation: 99

Add Future dates for missing Rows in a Dataframe

How to impute the Missed dates with next Dates in a data frame?

wtg_at1.tail(10)
AmbientTemperatue Date
818 31.237499 2020-03-28
819 32.865974 2020-03-29
820 32.032558 2020-03-30
821 31.671166 NaN
822 31.389927 NaN
823 31.243660 NaN
824 31.206777 NaN
825 31.241503 NaN
826 31.309531 NaN
827 31.382531 NaN

I am expecting my output data frame something similar to below. After 30th March, I am expecting next dates from 31st March.

AmbientTemperatue Date
818 31.237499 2020-03-28
819 32.865974 2020-03-29
820 32.032558 2020-03-30
821 31.671166 2020-03-31
822 31.389927 2020-04-01
823 31.243660 2020-04-02
824 31.206777 2020-04-03
825 31.241503 2020-04-04
826 31.309531 2020-04-05
827 31.382531 2020-04-06

I tried below code but not giving desired output.

wtg_at1.append(pd.DataFrame({'Date': pd.date_range(start=wtg_at1.Date.iloc[-8], periods=7, freq='D', closed='right')}))
wtg_at1
AmbientTemperatue Date
0 32.032558 2017-12-31
1 26.667757 2018-01-01
2 25.655754 2018-01-02
3 25.514013 2018-01-03
4 24.927652 2018-01-04
... ... ...
823 31.243660 NaN
824 31.206777 NaN
825 31.241503 NaN
826 31.309531 NaN
827 31.382531 NaN

Upvotes: 0

Views: 131

Answers (1)

jezrael
jezrael

Reputation: 862441

If there is only one group of missing values is possible forward filling them and add counter by cumulative sum converted to days timedeltas:

df['Date'] = pd.to_datetime(df['Date'])

df['Date'] = df['Date'].ffill() + pd.to_timedelta(df['Date'].isna().cumsum(), unit='d')
print (df)
     AmbientTemperatue       Date
818          31.237499 2020-03-28
819          32.865974 2020-03-29
820          32.032558 2020-03-30
821          31.671166 2020-03-31
822          31.389927 2020-04-01
823          31.243660 2020-04-02
824          31.206777 2020-04-03
825          31.241503 2020-04-04
826          31.309531 2020-04-05
827          31.382531 2020-04-06

Another possible idea is reassign values by minimal datetime and length of DataFrame:

df['Date'] = pd.date_range(df['Date'].min(), periods=len(df))

If there is multiple groups with missing values:

print (df)
     AmbientTemperatue        Date
818          31.237499  2020-03-28
819          32.865974  2020-03-29
820          32.032558  2020-03-30
821          31.671166         NaN
822          31.389927         NaN
823          31.243660         NaN
824          31.206777  2020-05-08
825          31.241503         NaN
826          31.309531         NaN
827          31.382531         NaN

df['Date'] = pd.to_datetime(df['Date'])

m = df['Date'].notna()
s = (~m).groupby(m.cumsum()).cumsum()
df['Date'] = df['Date'].ffill() + pd.to_timedelta(s, unit='d')
print (df)
    AmbientTemperatue       Date
818          31.237499 2020-03-28
819          32.865974 2020-03-29
820          32.032558 2020-03-30
821          31.671166 2020-03-31
822          31.389927 2020-04-01
823          31.243660 2020-04-02
824          31.206777 2020-05-08
825          31.241503 2020-05-09
826          31.309531 2020-05-10
827          31.382531 2020-05-11

Upvotes: 1

Related Questions