Reputation: 485
I have following datasets in dataframe
Time_stamp x y
'2012-01-01 00:00:00' 8.97 1310.03
'2012-01-01 00:10:00' 9.91 1684.52
'2012-01-01 00:40:00' 9.64 1532.05
'2012-01-01 00:50:00' 11.84 1997.87
'2012-01-01 00:60:00' 11.69 2135.76
'2012-01-01 01:00:00' 12.14 2149.54
'2012-01-01 01:10:00' 13.43 2056.35
'2012-01-01 01:20:00' 9.88 1633.45
'2012-01-01 01:30:00' 9.01 1315.85
'2012-01-01 01:50:00' 8.33 1141.84
As you can see, the data recorded in every 10 minutes. However, there is a missing timestamp and its corresponding values, for example, '2012-01-01 00:20:00'
and '2012-01-01 00:30:00'
. I would like to find such missing timestamp and replace their corresponding values with nan
. Something like this
timestamp x y
`'2012-01-01 00:20:00'` nan nan
`'2012-01-01 00:30:00'` nan nan
Any idea how to do this efficiently without much of lines of codes.
Upvotes: 1
Views: 788
Reputation: 862681
First convert values to datetimes, 60Min
in 2012-01-01 00:60:00
is not valid, so replaced to NaT
, remove mising values NaT
, then create DatetimeIndex
and add missing datetimes by DataFrame.asfreq
:
df['Time_stamp'] = pd.to_datetime(df['Time_stamp'].str.strip("'"), errors='coerce')
df = df.dropna(subset=['Time_stamp']).set_index('Time_stamp').asfreq('10Min')
print (df)
x y
Time_stamp
2012-01-01 00:00:00 8.97 1310.03
2012-01-01 00:10:00 9.91 1684.52
2012-01-01 00:20:00 NaN NaN
2012-01-01 00:30:00 NaN NaN
2012-01-01 00:40:00 9.64 1532.05
2012-01-01 00:50:00 11.84 1997.87
2012-01-01 01:00:00 12.14 2149.54
2012-01-01 01:10:00 13.43 2056.35
2012-01-01 01:20:00 9.88 1633.45
2012-01-01 01:30:00 9.01 1315.85
2012-01-01 01:40:00 NaN NaN
2012-01-01 01:50:00 8.33 1141.84
Upvotes: 1