Reputation: 99
How to impute the Missed dates with next Dates in a data frame?
wtg_at1.tail(10)
AmbientTemperatue | Date | |
---|---|---|
818 | 31.237499 | 2020-03-28 |
819 | 32.865974 | 2020-03-29 |
820 | 32.032558 | 2020-03-30 |
821 | 31.671166 | NaN |
822 | 31.389927 | NaN |
823 | 31.243660 | NaN |
824 | 31.206777 | NaN |
825 | 31.241503 | NaN |
826 | 31.309531 | NaN |
827 | 31.382531 | NaN |
I am expecting my output data frame something similar to below. After 30th March, I am expecting next dates from 31st March.
AmbientTemperatue | Date | |
---|---|---|
818 | 31.237499 | 2020-03-28 |
819 | 32.865974 | 2020-03-29 |
820 | 32.032558 | 2020-03-30 |
821 | 31.671166 | 2020-03-31 |
822 | 31.389927 | 2020-04-01 |
823 | 31.243660 | 2020-04-02 |
824 | 31.206777 | 2020-04-03 |
825 | 31.241503 | 2020-04-04 |
826 | 31.309531 | 2020-04-05 |
827 | 31.382531 | 2020-04-06 |
I tried below code but not giving desired output.
wtg_at1.append(pd.DataFrame({'Date': pd.date_range(start=wtg_at1.Date.iloc[-8], periods=7, freq='D', closed='right')}))
wtg_at1
AmbientTemperatue | Date | |
---|---|---|
0 | 32.032558 | 2017-12-31 |
1 | 26.667757 | 2018-01-01 |
2 | 25.655754 | 2018-01-02 |
3 | 25.514013 | 2018-01-03 |
4 | 24.927652 | 2018-01-04 |
... | ... | ... |
823 | 31.243660 | NaN |
824 | 31.206777 | NaN |
825 | 31.241503 | NaN |
826 | 31.309531 | NaN |
827 | 31.382531 | NaN |
Upvotes: 0
Views: 131
Reputation: 862441
If there is only one group of missing values is possible forward filling them and add counter by cumulative sum converted to days timedeltas:
df['Date'] = pd.to_datetime(df['Date'])
df['Date'] = df['Date'].ffill() + pd.to_timedelta(df['Date'].isna().cumsum(), unit='d')
print (df)
AmbientTemperatue Date
818 31.237499 2020-03-28
819 32.865974 2020-03-29
820 32.032558 2020-03-30
821 31.671166 2020-03-31
822 31.389927 2020-04-01
823 31.243660 2020-04-02
824 31.206777 2020-04-03
825 31.241503 2020-04-04
826 31.309531 2020-04-05
827 31.382531 2020-04-06
Another possible idea is reassign values by minimal datetime and length of DataFrame
:
df['Date'] = pd.date_range(df['Date'].min(), periods=len(df))
If there is multiple groups with missing values:
print (df)
AmbientTemperatue Date
818 31.237499 2020-03-28
819 32.865974 2020-03-29
820 32.032558 2020-03-30
821 31.671166 NaN
822 31.389927 NaN
823 31.243660 NaN
824 31.206777 2020-05-08
825 31.241503 NaN
826 31.309531 NaN
827 31.382531 NaN
df['Date'] = pd.to_datetime(df['Date'])
m = df['Date'].notna()
s = (~m).groupby(m.cumsum()).cumsum()
df['Date'] = df['Date'].ffill() + pd.to_timedelta(s, unit='d')
print (df)
AmbientTemperatue Date
818 31.237499 2020-03-28
819 32.865974 2020-03-29
820 32.032558 2020-03-30
821 31.671166 2020-03-31
822 31.389927 2020-04-01
823 31.243660 2020-04-02
824 31.206777 2020-05-08
825 31.241503 2020-05-09
826 31.309531 2020-05-10
827 31.382531 2020-05-11
Upvotes: 1