How to clean up or smoothen a time series using two criteria in Pandas

Question

Sorry for the confusing title. I'm trying to clean up a dataset that has engine hours reported on different time intervals. I'm trying to detect and address two situations:

Engine hours reported are less than last records engine hours

Engine hours reported between two dates is greater than the hour difference between said dates

Sampled Date    Meter   eng_hours_diff  date_hours_diff
 2017-02-02     5336         24                24
 2017-02-20     5578         242               432
 2017-02-22     5625         47                48
 2017-03-07     5930         305               312
 2017-05-16     6968         1038              1680
 2017-06-01     7182         214               384
 2017-06-22     7527         345               504
 2017-07-10     7919         392               432
 2017-07-25     16391        8472              360
 2017-08-20     8590        -7801              624
 2017-09-05     8827         237               384
 2017-09-26     9106         279               504
 2017-10-16     9406         300               480
 2017-10-28     9660         254               288
 2017-11-29     10175        515               768

What I would like to do is re-write the ['Meter'] series if either of the two scenarios above come up and take the average between the points around it.

I'm thinking that this might require two steps, one to eliminate any inaccuracy due to the difference in engine hours being > than the hours between the dates, and then re-calculate the ['eng_hours_diff'] column and check if there are still any that are negative.

The last two columns I've calculated like this:

dfa['eng_hours_diff'] = dfa['Meter'].diff().fillna(24)

dfa['date_hours_diff'] = dfa['Sampled Date'].diff().apply(lambda x:str(x)).apply(lambda x: x.split(' ')[0]).apply(lambda x:x.replace('NaT',"1")).apply(lambda x: int(x)*24)

EDIT: Ok I think I'm getting somewhere but not quite there yet..

dfa['MeterReading'] = [x if y>0 & y


EDIT 2:
I'm much closer thanks to Bill's answer.
Applying this function will replace any record that doesn't meet the criteria with a zero. Then I'm replacing those zeros with np.nan and using the interpolate method.
The only thing that I'm missing is how to fill out the last values when they also come as np.nan, I'm looking to see if there's an extrapolate method.
Here is the function in case anyone stumbles upon a similar problem in the future:
dfa['MeterReading'] = dfa['MeterReading'].replace({0:np.nan}).interpolate(method='polynomial', order=2, limit=5, limit_direction='both').bfill()

This is the issue that I'm having at the end. Two values were missed but since the difference becomes negative it discards all 4.

How to clean up or smoothen a time series using two criteria in Pandas

Answers (1)

Related Questions