Patryk
Patryk

Reputation: 31

Average a big irregularly spaced time series in 10 minutes intervals to produce a 24 hour profile

I'm having trouble when trying to calculate the average temperature by 10 minutes but within 24 hours vector.

I have a time series stored in dataframe in a proper POSIX format. The only problem is that data is irregularly spaced (10 +-3 min).

I know how to average them in hours, days, months years but I need to get average for a profile of 24, 72 or 168 hours.

For example, for 168 hours profile I want to average every Monday in observation period at 00:00:00, then at 00:10:00, 00:20:00 etc., then every Tuesday, Wednesday etc.

So my data averages must fit into a regular 24/72/168 vector.

For 24 hours vector will be defined like this:

seq(ISOdatetime(2001,2,3,0,0,0), ISOdatetime(2001,2,4,0,0,0), by=(60*5))

Resulting in a regular 24 hours vector source of this solution here

  [1] "2001-02-03 00:00:00 PST" "2001-02-03 00:05:00 PST"
  [3] "2001-02-03 00:10:00 PST" "2001-02-03 00:15:00 PST"
  [5] "2001-02-03 00:20:00 PST" "2001-02-03 00:25:00 PST"
  [7] "2001-02-03 00:30:00 PST" "2001-02-03 00:35:00 PST"
  [9] "2001-02-03 00:40:00 PST" "2001-02-03 00:45:00 PST"

Problem is that timestamp of my data is shifting with each day as you can see from sample below. On 2016-09-01 first reading (supposed to be at 00:00:00) is at 00:01:00, next day it is 00:04:00, next day 00:07:00 and so on.

I tried xts and zoo with no success as the limit of aggregation is hour and I need to define it in minutes.

The multiple answers I have found deal with averaging timeseries through continuous set of data (example1, example2.

Unfortunately I couldn't find an answer regarding my problem.

Structure of sample database:

'data.frame':   9490 obs. of  2 variables:
 $ Date_Time_Stamp : POSIXct, format: "2016-09-01 00:01:00" "2016-09-01 00:11:00" "2016-09-01 00:22:00" "2016-09-01 00:32:00" ...
 $ Signal_Raw_Value: num  778 694 592 523 567 ...

My data look like this (head)

Date_Time_Stamp Signal_Raw_Value
1 2016-09-01 00:01:00           777.51
2 2016-09-01 00:11:00           694.38
3 2016-09-01 00:22:00           591.69
4 2016-09-01 00:32:00           523.23
5 2016-09-01 00:42:00           567.24
6 2016-09-01 00:52:00           547.68

Tail:

Date_Time_Stamp Signal_Raw_Value
9485 2016-11-06 23:02:00           660.15
9486 2016-11-06 23:12:00           635.70
9487 2016-11-06 23:22:00           498.78
9488 2016-11-06 23:32:00           415.65
9489 2016-11-06 23:42:00           425.43
9490 2016-11-06 23:53:00           440.10

First hour of 2016-09-01

Date_Time_Stamp Signal_Raw_Value
1    2016-09-01 00:01:00           777.51
2    2016-09-01 00:11:00           694.38
3    2016-09-01 00:22:00           591.69
4    2016-09-01 00:32:00           523.23
5    2016-09-01 00:42:00           567.24
6    2016-09-01 00:52:00           547.68
7    2016-09-01 01:02:00           562.35

First hour of next day (2016-09-02)

143  2016-09-02 00:04:00           557.46
144  2016-09-02 00:14:00           557.46
145  2016-09-02 00:24:00           562.35
146  2016-09-02 00:35:00           552.57
147  2016-09-02 00:45:00           503.67
148  2016-09-02 00:55:00           484.11
149  2016-09-02 01:05:00           454.77

First hour of third day in row (2016-09-03)

285  2016-09-03 00:07:00           655.26
286  2016-09-03 00:17:00           537.90
287  2016-09-03 00:27:00           464.55
288  2016-09-03 00:38:00           454.77
289  2016-09-03 00:48:00           425.43
290  2016-09-03 00:58:00           420.54
291  2016-09-03 01:08:00           400.98

And a fourth day:

426  2016-09-04 00:00:00           865.53
427  2016-09-04 00:10:00           723.72
428  2016-09-04 00:20:00           621.03
429  2016-09-04 00:30:00           562.35
430  2016-09-04 00:40:00           493.89
431  2016-09-04 00:51:00           459.66
432  2016-09-04 01:01:00           435.21

After processing raw value of signal I need to produce something like this: 24 hrs profile and 168 hrs: 168 hrs profile.

Thank you!

Upvotes: 3

Views: 915

Answers (1)

Dave2e
Dave2e

Reputation: 24079

Since you are still struggling with this problem try this:

#Create sample data
 #create a sequence of random times (about 10 minutes apart) 
 rtime <-as.integer(rnorm(1000, 10, 2))
 Date_Time_Stamp<- as.POSIXct("2016-09-01")
 Date_Time_Stamp<-Date_Time_Stamp+cumsum(rtime)*60
 Signal_Raw_Value <- rnorm(1000, 600, 20)
 df<-data.frame(Date_Time_Stamp, Signal_Raw_Value)
#End of sample data creation

#Calclated the number of minutes since midnight
df$minutes<-as.integer(format(df$Date_Time_Stamp, "%H"))*60 + as.integer(format(df$Date_Time_Stamp, "%M"))
#break into 144 intervals per day
df$mybreaks<-cut(df$minutes, breaks = seq(0, 1440, 10), include.lowest = TRUE)

#Using dplyr 
library(dplyr)
#find mean of each group
summarise( group_by(df, mybreaks), mean(Signal_Raw_Value))
#find number of elements in each grouping
summarise( group_by(df, mybreaks), n())

Your problem statement was not very clear. Here is a solution which will break the day into 144 10-minutes (1440 minutes/day) segments and averages the data across the entire dataset into those 144 intervals.

Upvotes: 2

Related Questions