Fill NaN value to continuous time series data where some timeframe were missing

Question

An illustration of my question

Dealing with hourly varying time series data which contain datetime from 2016-01-01 00:00 to 2016-01-07 23:00 and some feature corresponding to each timeframe.

Ideally, there would be 7 x 24 rows of data covering all the time period. But sometimes, the hourly data may interrupt by some reason (e.g capturing hourly weather information, but the website were broke in specific hours.)

My dataframe now contain all identical data without missing hour. The length of my dataframe were 7 *24 - 5, which mean there were 5 missing hourly data.

The datetime are saved in the format of 201601010100 representing 2016-01-01 01:00:00

My attempt.

I have tried to generated new dataframe which is length of 7*24 containing continuous hourly datetime

data = {"datetime":[],"feature1":[],"feature2":[]}
ff = pd.DataFrame(data)
rng = pd.date_range('01/01/2016', periods=600, freq='H')
new_date = list(rng.strftime('%Y%m%d%H'))
ff['datetime'] = new_date

The original dataframe was df which containing datetime, feature1,feature2.

I tried to f3 = ff.merge(df, on ='dataframe'').

But f3 was still in the length of 7*24.

My target

Due to large temporal coverage and feature dimension in pratical, I don't want to loop the two dataframe and compare their datetime.values one by one.

I want to replace the value of feature in ff based on the identical datetime in df and keep the NaN information still.

DYZ · Accepted Answer

Create an empty dataframe with rng as index:

skeleton = pd.DataFrame(index=rng)

Convert the original dates to numpy.datetime64 to make them compatible with timerange:

df['datetime_ns'] = df['datetime'].astype(numpy.datetime64)

Perform an outer join of the frames on index and datetime_ns:

new_df = df.merge(skeleton, left_on='datetime_ns',right_index=True,how='outer')

Sort the new dataframe, if necessary:

new_df.sort_values('datetime_ns', inplace=True)

Fill NaN value to continuous time series data where some timeframe were missing

An illustration of my question

My attempt.

My target

Answers (2)

Related Questions