sym246
sym246

Reputation: 1866

Compute column average based on date and time in R

I have a matrix, which looks a bit like this:

        Date     Time                        Data
15000 04/09/2014 05:45:00                   0.908
15001 04/09/2014 06:00:00                   0.888
15002 04/09/2014 06:15:00                   0.976
15003 04/09/2014 06:30:00                   1.632
15004 04/09/2014 06:45:00                   1.648
15005 04/09/2014 07:00:00                   1.164
15006 04/09/2014 07:15:00                   0.568
15007 04/09/2014 07:30:00                   1.020
15008 04/09/2014 07:45:00                   1.052
15009 04/09/2014 08:00:00                   0.920
15010 04/09/2014 08:15:00                   0.656
15011 04/09/2014 08:30:00                   1.172
15012 04/09/2014 08:45:00                   1.000
15013 04/09/2014 09:00:00                   1.420
15014 04/09/2014 09:15:00                   0.936
15015 04/09/2014 09:30:00                   0.996
15016 04/09/2014 09:45:00                   1.100
15017 04/09/2014 10:00:00                   0.492

It contains a years worth of data, with each day having a 96 rows (15 minute intervals from 00:00 to 23:45). My question is that I'd like to average the data column, for each day, based on the time range I specify. For example, if I wanted to average over times 06:00 - 08:00 for each day, in the code above I should get an answer of 1.0964 for the date 04/09/2014.

I have no idea how to do this using the date and time columns as filters, and wondered if someone could help?

To make things even more complicated, I would also like to compute 45 minute rolling averages for each day, within a different time period, say 04:00 - 09:00. Again, as this is for each day, it would be good to get the result in a matrix for which each row is a certain date, then the columns would represent the rolling averages from say, 04:00 - 04:45, 04:15 - 05:00...

Any ideas?!

Upvotes: 1

Views: 1433

Answers (1)

user5363218
user5363218

Reputation:

check the following code and let me know if anything is unclear

data = read.table(header = T, stringsAsFactors = F,  text  = "Index                Date     Time                        Data
15000 04/09/2014 05:45:00                   0.908
15001 04/09/2014 06:00:00                   0.888
15002 04/09/2014 06:15:00                   0.976
15003 04/09/2014 06:30:00                   1.632
15004 04/09/2014 06:45:00                   1.648
15005 04/09/2014 07:00:00                   1.164
15006 04/09/2014 07:15:00                   0.568
15007 04/09/2014 07:30:00                   1.020
15008 04/09/2014 07:45:00                   1.052
15009 04/09/2014 08:00:00                   0.920
15010 04/09/2014 08:15:00                   0.656
15011 04/09/2014 08:30:00                   1.172
15012 04/09/2014 08:45:00                   1.000
15013 04/09/2014 09:00:00                   1.420
15014 04/09/2014 09:15:00                   0.936
15015 04/09/2014 09:30:00                   0.996
15016 04/09/2014 09:45:00                   1.100
15017 04/09/2014 10:00:00                   0.492")


library("magrittr")

data$parsed.timestamp = paste(data$Date, data$Time) %>% strptime(., format = "%d/%m/%Y %H:%M:%S")

# Hourly Average
desiredGroupingUnit = cut(data$parsed.timestamp, breaks = "hour") #You can use substr for that also 
aggregate(data$Data, by = list(desiredGroupingUnit), FUN = mean )

#           Group.1     x
# 1 2014-09-04 05:00:00 0.908
# 2 2014-09-04 06:00:00 1.286
# 3 2014-09-04 07:00:00 0.951
# 4 2014-09-04 08:00:00 0.937
# 5 2014-09-04 09:00:00 1.113
# 6 2014-09-04 10:00:00 0.492



# Moving average

getAvgBetweenTwoTimeStamps = function(data, startTime, endTime) {
  avergeThoseIndcies = which(data$parsed.timestamp >= startTime &     data$parsed.timestamp <= endTime)
  return(mean(data$Data[avergeThoseIndcies]))
}

movingAvgWindow = 45*60 #minutes
movingAvgTimestamps = data.frame(from = data$parsed.timestamp, to = data$parsed.timestamp + movingAvgWindow)

movingAvgTimestamps$movingAvg =  
  apply(movingAvgTimestamps, MARGIN = 1, 
      FUN = function(x) getAvgBetweenTwoTimeStamps(data = data, startTime = x["from"], endTime = x["to"]))

print(movingAvgTimestamps)

# from                  to movingAvg
# 1  2014-09-04 05:45:00 2014-09-04 06:30:00 1.1010000
# 2  2014-09-04 06:00:00 2014-09-04 06:45:00 1.2860000
# 3  2014-09-04 06:15:00 2014-09-04 07:00:00 1.3550000
# 4  2014-09-04 06:30:00 2014-09-04 07:15:00 1.2530000
# 5  2014-09-04 06:45:00 2014-09-04 07:30:00 1.1000000
# 6  2014-09-04 07:00:00 2014-09-04 07:45:00 0.9510000
# 7  2014-09-04 07:15:00 2014-09-04 08:00:00 0.8900000
# 8  2014-09-04 07:30:00 2014-09-04 08:15:00 0.9120000
# 9  2014-09-04 07:45:00 2014-09-04 08:30:00 0.9500000
# 10 2014-09-04 08:00:00 2014-09-04 08:45:00 0.9370000
# 11 2014-09-04 08:15:00 2014-09-04 09:00:00 1.0620000
# 12 2014-09-04 08:30:00 2014-09-04 09:15:00 1.1320000
# 13 2014-09-04 08:45:00 2014-09-04 09:30:00 1.0880000
# 14 2014-09-04 09:00:00 2014-09-04 09:45:00 1.1130000
# 15 2014-09-04 09:15:00 2014-09-04 10:00:00 0.8810000
# 16 2014-09-04 09:30:00 2014-09-04 10:15:00 0.8626667
# 17 2014-09-04 09:45:00 2014-09-04 10:30:00 0.7960000
# 18 2014-09-04 10:00:00 2014-09-04 10:45:00 0.4920000

Upvotes: 5

Related Questions