adrian
adrian

Reputation: 148

How to fill a time series that's missing data but only when the gap is smaller than a certain number?

Good afternoon,

I'm working on pre-processing data that's streaming from sensors and which normally comes in every second (1hz). However this is not always the case, there are instances when there is gaps of data of 2s, 3s and even more.

I'm trying to set up some code that fills these gaps but only when they're smaller than some number, say 10 seconds.

The data comes in as follows:

     Timestamp           Sensor1    Sensor2    Sensor3
7/1/2020 00:00:00           5         135        77
7/1/2020 00:00:01           6         118        79
7/1/2020 00:00:02           4         131        75
7/1/2020 00:00:04           3         125        78
7/1/2020 00:00:05           9         145        67
7/1/2020 00:00:06           6         136        71
7/1/2020 00:00:10           7         141        77
7/1/2020 00:00:11           4         145        72

What I'd like to do is fill the dataframe whenever the missed window is smaller than 10 seconds, and fill it with the average of the two neighboring values.

     Timestamp           Sensor1    Sensor2    Sensor3
7/1/2020 00:00:00           5         135        77
7/1/2020 00:00:01           6         118        79
7/1/2020 00:00:02           4         131        75
7/1/2020 00:00:03           3.5       128        76.5
7/1/2020 00:00:04           3         125        78
7/1/2020 00:00:05           9         145        67
7/1/2020 00:00:06           6         136        71
7/1/2020 00:00:07           6.5       138.5      74
7/1/2020 00:00:08           6.5       138.5      74
7/1/2020 00:00:09           6.5       138.5      74
7/1/2020 00:00:10           7         141        77
7/1/2020 00:00:11           4         145        72

I think once I can set up the proper time "grid" with no missing seconds filling it should be relatively simple using the fill method. But how do I tell it to only fill windows smaller than 10 seconds?

Thanks in advance

Upvotes: 1

Views: 343

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Try:

#Make sure Timestamp is datetime
df['Timestamp'] = pd.to_datetime(df['Timestamp'])

#Create DatetimeIndex by moving Timestamp into index
df = df.set_index('Timestamp')

#Resample dataframe with freq seconds
dfs = df.resample('S')

#Calculate average using forward fill and bfill get last and next non-NaN values
df_out = (dfs.ffill()+dfs.bfill()) / 2
df_out

Output:

                     Sensor1  Sensor2  Sensor3
Timestamp                                     
2020-07-01 00:00:00      5.0    135.0     77.0
2020-07-01 00:00:01      6.0    118.0     79.0
2020-07-01 00:00:02      4.0    131.0     75.0
2020-07-01 00:00:03      3.5    128.0     76.5
2020-07-01 00:00:04      3.0    125.0     78.0
2020-07-01 00:00:05      9.0    145.0     67.0
2020-07-01 00:00:06      6.0    136.0     71.0
2020-07-01 00:00:07      6.5    138.5     74.0
2020-07-01 00:00:08      6.5    138.5     74.0
2020-07-01 00:00:09      6.5    138.5     74.0
2020-07-01 00:00:10      7.0    141.0     77.0
2020-07-01 00:00:11      4.0    145.0     72.0

Upvotes: 0

Ben.T
Ben.T

Reputation: 29635

without the 10s missing window, it is something with resample and interpolate.

df.set_index('Timestamp').resample('s').interpolate().reset_index()

To add the filling only when less than 10s missing, then you can use groupby and get a new group where the diff between 2 rows are less than 10 in seconds. Note: to see it, I changed in your data 10 by 20 and 11 by 22 in the last two timestamps.

print (df.set_index('Timestamp')
         .groupby(df['Timestamp'].diff().dt.total_seconds()
                                 .gt(10).cumsum()
                                 .to_numpy())
         .apply(lambda x: x.resample('s').interpolate())
         .reset_index()
         .drop('level_0', axis=1)
      )
            Timestamp  Sensor1  Sensor2  Sensor3
0 2020-07-01 00:00:00      5.0    135.0     77.0
1 2020-07-01 00:00:01      6.0    118.0     79.0
2 2020-07-01 00:00:02      4.0    131.0     75.0
3 2020-07-01 00:00:03      3.5    128.0     76.5
4 2020-07-01 00:00:04      3.0    125.0     78.0
5 2020-07-01 00:00:05      9.0    145.0     67.0
6 2020-07-01 00:00:06      6.0    136.0     71.0
7 2020-07-01 00:00:20      7.0    141.0     77.0 
8 2020-07-01 00:00:21      5.5    143.0     74.5
9 2020-07-01 00:00:22      4.0    145.0     72.0

Upvotes: 1

Related Questions