Reputation: 101
I have a time series that is very irregular. The difference in time, between two records can be 1s or 10 days.
I want to resample the data every 1h, but only when the sequential records are less than 1h.
How to approach this, without making too many loops?
In the example above, I would like to resample only rows 5-6 (delta difference is 10s) and rows 6-7 (delta difference is 50min). The others should remain as they are.
tmp=vals[['datumtijd','filter data']]
datumtijd filter data
0 1970-11-01 00:00:00 129.0
1 1970-12-01 00:00:00 143.0
2 1971-01-05 00:00:00 151.0
3 1971-02-01 00:00:00 151.0
4 1971-03-01 00:00:00 163.0
5 1971-03-01 00:00:10 163.0
6 1971-03-01 00:00:20 163.0
7 1971-03-01 00:01:10 163.0
8 1971-03-01 00:04:10 163.0
.. ... ...
244 1981-08-19 00:00:00 102.0
245 1981-09-02 00:00:00 98.0
246 1981-09-17 00:00:00 92.0
247 1981-10-01 00:00:00 89.0
248 1981-10-19 00:00:00 92.0
Upvotes: 1
Views: 313
Reputation: 8790
You can be a little explicit about this by using groupby
on the hour-floor of the time stamps:
grouped = df.groupby(df['datumtijd'].dt.floor('1H')).mean()
This is explicitly looking for the hour of each existing data point and grouping the matching ones.
But you can also just do the resample and then filter out the empty data, as pandas
can still do this pretty quickly:
resampled = df.resample('1H', on='datumtijd').mean().dropna()
In either case, you get the following (note that I changed the last time stamp just so that the console would show the hours):
filter data
datumtijd
1970-11-01 00:00:00 129.0
1970-12-01 00:00:00 143.0
1971-01-05 00:00:00 151.0
1971-02-01 00:00:00 151.0
1971-03-01 00:00:00 163.0
1981-08-19 00:00:00 102.0
1981-09-02 00:00:00 98.0
1981-09-17 00:00:00 92.0
1981-10-01 00:00:00 89.0
1981-10-19 03:00:00 92.0
One quick clarification also. In your example, rows 5-8 all occur within the same hour, so they all get grouped together (hour:minute:second)!.
Also, see this related post.
Upvotes: 1