Case Msee
Case Msee

Reputation: 405

Fill missing date and time in Python (pandas)

I have a large data set, a sample is given below. The data is recorded for 1 day with 5-min interval for 24 hours for 3214 unique ids.

I searched for the relevant answers 1, 2 but it did not work.

The most relent answer was here, but the solutions are only for missing dates not for both date and times.

sample_data.csv

datetime    Unique_ID   VALUE
28/01/2018 2:05 105714  284
28/01/2018 2:10 105714  292
28/01/2018 2:15 105714  287
28/01/2018 2:20 105714  193
28/01/2018 2:25 105714  2348
28/01/2018 2:30 105714  284
28/01/2018 2:35 105714  286
28/01/2018 2:40 105714  293
28/01/2018 2:45 105714  229
28/01/2018 2:50 105714  2292
28/01/2018 2:55 105714  225
28/01/2018 4:45 105714  185
28/01/2018 4:50 105714  277
28/01/2018 4:55 105714  185
28/01/2018 5:00 105714  2435
28/01/2018 5:05 105714  273
28/01/2018 5:10 105714  2360
28/01/2018 5:15 105714  282

enter image description here

Upvotes: 1

Views: 676

Answers (1)

Anurag Dabas
Anurag Dabas

Reputation: 24322

Here in pandas you can try:

Firstly get the index of nan values:

idx=df[df.isna().all(1)].index

Then use ffill() method:

df[['Unique_ID','datetime']]=df[['Unique_ID','datetime']].ffill()

Finally you can use groupby() method and transform() method:

df['VALUE']=df.groupby(['Unique_ID','datetime'],dropna=False).transform('last')

Now if you want to increment 'datetime' column value by 5 mins which are filled by ffill() method then:

df.loc[idx,'datetime']=df.loc[idx,'datetime']+pd.to_timedelta(5,unit='m')

Upvotes: 1

Related Questions