Reputation: 649
I have got a dataframe that represent 1 Sec of data that supposed to be sample at 100 Hz.
I would like to
1) resample
it which at the rate of 10 Millisecond with "avg" approach for each column and 2) add extra rows based on interpolation approach when missing, as in the following:
DF_input:
ephoc_as_datatime att1 att2
2000-01-01 11:22:37.130 0 4
2000-01-01 11:22:37.138 1 5
2000-01-01 11:22:37.149 2 6
2000-01-01 11:22:37.156 3 7
2000-01-01 11:22:37.165 4 8
2000-01-01 11:22:37.168 5 9
2000-01-01 11:22:37.169 3 7
2000-01-01 11:22:37.567 7 3
2000-01-01 11:22:38.120 8 4
DF_output:
ephoc_as_datatime att1 att2
2000-01-01 11:22:37.130 0 4
2000-01-01 11:22:37.140 1 5
2000-01-01 11:22:37.150 2 6
2000-01-01 11:22:37.160 3 7
2000-01-01 11:22:37.170 4 8
....adding the missing one
2000-01-01 11:22:37.570 7 3
....adding the missing one
2000-01-01 11:22:38.120 8 4
I know that I should be using resample
and interpolate
.
Please, any suggestion would be very appreciated.
Many Thanks, Best Regards, Carlo
Upvotes: 1
Views: 45
Reputation: 862601
I think you need resample
by 10L
for 10ms
with interpolate
:
#if necessary convert to datetimes
#df['ephoc_as_datatime'] = pd.to_datetime(df['ephoc_as_datatime'])
df = df.resample('10L', on='ephoc_as_datatime').mean().interpolate()
print (df.head(20))
att1 att2
ephoc_as_datatime
2000-01-01 11:22:37.130 0.500 4.500
2000-01-01 11:22:37.140 2.000 6.000
2000-01-01 11:22:37.150 3.000 7.000
2000-01-01 11:22:37.160 4.000 8.000
2000-01-01 11:22:37.170 4.075 7.875
2000-01-01 11:22:37.180 4.150 7.750
2000-01-01 11:22:37.190 4.225 7.625
2000-01-01 11:22:37.200 4.300 7.500
2000-01-01 11:22:37.210 4.375 7.375
2000-01-01 11:22:37.220 4.450 7.250
2000-01-01 11:22:37.230 4.525 7.125
2000-01-01 11:22:37.240 4.600 7.000
2000-01-01 11:22:37.250 4.675 6.875
2000-01-01 11:22:37.260 4.750 6.750
2000-01-01 11:22:37.270 4.825 6.625
2000-01-01 11:22:37.280 4.900 6.500
2000-01-01 11:22:37.290 4.975 6.375
2000-01-01 11:22:37.300 5.050 6.250
2000-01-01 11:22:37.310 5.125 6.125
2000-01-01 11:22:37.320 5.200 6.000
Detail:
print(df.resample('10L', on='ephoc_as_datatime').mean().head(20))
att1 att2
ephoc_as_datatime
2000-01-01 11:22:37.130 0.5 4.5
2000-01-01 11:22:37.140 2.0 6.0
2000-01-01 11:22:37.150 3.0 7.0
2000-01-01 11:22:37.160 4.0 8.0
2000-01-01 11:22:37.170 NaN NaN
2000-01-01 11:22:37.180 NaN NaN
2000-01-01 11:22:37.190 NaN NaN
2000-01-01 11:22:37.200 NaN NaN
2000-01-01 11:22:37.210 NaN NaN
2000-01-01 11:22:37.220 NaN NaN
2000-01-01 11:22:37.230 NaN NaN
2000-01-01 11:22:37.240 NaN NaN
2000-01-01 11:22:37.250 NaN NaN
2000-01-01 11:22:37.260 NaN NaN
2000-01-01 11:22:37.270 NaN NaN
2000-01-01 11:22:37.280 NaN NaN
2000-01-01 11:22:37.290 NaN NaN
2000-01-01 11:22:37.300 NaN NaN
2000-01-01 11:22:37.310 NaN NaN
2000-01-01 11:22:37.320 NaN NaN
Upvotes: 2