How to fill missing timestamps in pandas

Question

I have a CSV file as below:

                  t  dd  hh  v.amm  v.alc  v.no2  v.cmo   aqi
0      201811170000  17   0   0.40   0.41   1.33   1.55  2.45
1      201811170002  17   0   0.40   0.41   1.34   1.51  2.46
2      201811170007  17   0   0.40   0.37   1.35   1.45  2.40

Now I have to fill in the missing minutes by last observation carried forward. Expected output:

                  t  dd  hh  v.amm  v.alc  v.no2  v.cmo   aqi
0      201811170000  17   0   0.40   0.41   1.33   1.55  2.45
1      201811170001  17   0   0.40   0.41   1.33   1.55  2.45
2      201811170002  17   0   0.40   0.41   1.34   1.51  2.46
2      201811170003  17   0   0.40   0.41   1.34   1.51  2.46
2      201811170004  17   0   0.40   0.41   1.34   1.51  2.46
2      201811170005  17   0   0.40   0.41   1.34   1.51  2.46
2      201811170006  17   0   0.40   0.41   1.34   1.51  2.46
3      201811170007  17   0   0.40   0.37   1.35   1.45  2.40

I tried following this link but unable to achieve the expected output. Sorry I'm new to coding.

jezrael · Accepted Answer

First create DatetimeIndex by to_datetime and DataFrame.set_index and then change frequency by DataFrame.asfreq:

df['t'] = pd.to_datetime(df['t'], format='%Y%m%d%H%M')
df = df.set_index('t').sort_index().asfreq('Min', method='ffill')
print (df)
                     dd  hh  v.amm  v.alc  v.no2  v.cmo   aqi
t                                                            
2018-11-17 00:00:00  17   0    0.4   0.41   1.33   1.55  2.45
2018-11-17 00:01:00  17   0    0.4   0.41   1.33   1.55  2.45
2018-11-17 00:02:00  17   0    0.4   0.41   1.34   1.51  2.46
2018-11-17 00:03:00  17   0    0.4   0.41   1.34   1.51  2.46
2018-11-17 00:04:00  17   0    0.4   0.41   1.34   1.51  2.46
2018-11-17 00:05:00  17   0    0.4   0.41   1.34   1.51  2.46
2018-11-17 00:06:00  17   0    0.4   0.41   1.34   1.51  2.46
2018-11-17 00:07:00  17   0    0.4   0.37   1.35   1.45  2.40

Or use DataFrame.resample with Resampler.ffill:

df['t'] = pd.to_datetime(df['t'], format='%Y%m%d%H%M')
df = df.set_index('t').sort_index().resample('Min').ffill()

How to fill missing timestamps in pandas

Answers (1)

Related Questions