marsl
marsl

Reputation: 1029

Aggregate a data.frame by time series and with different functions

I have lots of measurement values, recorded each minute. Of some values have mean, min and max values for the given minute. I'd like to summarize/aggregate the whole data.frame to have one entry every 30 minutes, so

str(wgData)
'data.frame':   115200 obs. of  7 variables:
 $ TIMESTAMP          : POSIXct, format: "2012-11-24 00:00:00" "2012-11-24 00:01:00" "2012-11-24 00:02:00" 7"2012-11-24 00:03:00" ...
 $ RECORD             : int  11683 11684 11685 11686 11687 11688 11689 11690 11691 11692 ...
 $ TPanel             : num  -0.075 -0.075 -0.075 -0.095 -0.095 -0.095 -0.095 -0.118 -0.118 -0.118 ...
 $ VBattery           : num  13.8 13.8 13.8 13.8 13.8 ...
 $ VBatteryHeating_Avg: num  12.2 12.2 12.2 12.2 12.2 ...
 $ VBatteryHeating_Min: num  12.2 12.2 12.2 12.2 12.2 ...
 $ VBatteryHeating_Max: num  12.2 12.2 12.2 12.2 12.2 ...

So I'd like to calculate for every 30 minutes: TIMESTAMP, mean of TPanel (Temperatur of Panel), mean of VBattery, mean of VBatteryHeating_Avg, min of VBatteryHeating_Min, max of VBatteryHeating_Max

I had some success by doing

wgData30min <- aggregate(list(TP = wgData$TPanel, VB=wgData$VBatteryHeating_Avg, VB_MIN=wgData$VBatteryHeating_Min, VB_MAX=wgData$VBatteryHeating_Min),
               list(Timestamp = cut(wgData$TIMESTAMP, "30 min")),
               mean)
head(wgData30min)
            Timestamp         TP       VB   VB_MIN   VB_MAX
1 2012-11-24 00:00:00 -0.1621667 12.15467 12.15333 12.15333
2 2012-11-24 00:30:00 -0.4751667 12.13333 12.13133 12.13133
3 2012-11-24 01:00:00 -0.5647333 12.11167 12.11067 12.11067
4 2012-11-24 01:30:00 -0.4573667 12.09133 12.08967 12.08967
5 2012-11-24 02:00:00 -0.4923667 12.07100 12.07000 12.07000
6 2012-11-24 02:30:00 -0.6469000 12.04933 12.04733 12.04733

... but did not manage to pass an array of functions to apply to the columns. Any help is appreciated.

Upvotes: 0

Views: 1752

Answers (1)

James Pringle
James Pringle

Reputation: 1079

I believe your data looks something like this

seconds <- seq(0,100000, by= 600)
dates <- as.POSIXlt(seconds, origin = "2012-11-24", tz = "UTC")
TPanel <- rnorm(167)
VBatteryHeating_Avg <- rcauchy(167)
VBatteryHeating_Min <- runif(167)
VBatteryHeating_Max <- rexp(167)

wgData <- data.frame(TIMESTAMP = dates, 
                     TPanel = TPanel, 
                     VBatteryHeating_Avg = VBatteryHeating_Avg, 
                     VBatteryHeating_Min = VBatteryHeating_Min, 
                     VBatteryHeating_Max = VBatteryHeating_Max)

head(wgData)
##             TIMESTAMP     TPanel VBatteryHeating_Avg VBatteryHeating_Min VBatteryHeating_Max
## 1 2012-11-24 00:00:00  0.4770116          10.2937806          0.80151633           0.8722767
## 2 2012-11-24 00:10:00  0.0304906         -20.7057773          0.32311092           0.7172383
## 3 2012-11-24 00:20:00  1.4875903           0.5749393          0.74020471           0.5857239
## 4 2012-11-24 00:30:00  0.4933884           6.6567398          0.73824231           0.3691020
## 5 2012-11-24 00:40:00 -0.0369843           3.4332840          0.06552402           0.2455765
## 6 2012-11-24 00:50:00  0.7339858          -3.3787044          0.06451802           0.5952835

Probably the best solution is to use plyr. First, use cut as before to make an indicator for your 30-minute chunks. Then use ddply, splitting the data frame by that variable.

wgData$Timestamp30min <- cut(wgData$TIMESTAMP,"30 min")

library(plyr)

out <- ddply(wgData, .(Timestamp30min), summarize,
             TP = mean(TPanel),
             VB = mean(VBatteryHeating_Avg),
             VB_min = min(VBatteryHeating_Min),
             VB_max = max(VBatteryHeating_Max))

head(out)
##        Timestamp30min         TP          VB     VB_min    VB_max
## 1 2012-11-24 00:00:00  0.6650308 -3.27901911 0.32311092 0.8722767
## 2 2012-11-24 00:30:00  0.3967966  2.23710649 0.06451802 0.5952835
## 3 2012-11-24 01:00:00 -0.1326459 -1.20082543 0.50358789 1.0569388
## 4 2012-11-24 01:30:00  0.7845420 -0.07520645 0.14500901 0.9656004
## 5 2012-11-24 02:00:00 -0.4523882  0.40472169 0.24997021 1.4056166
## 6 2012-11-24 02:30:00 -0.2317818  0.61860868 0.64909054 0.2338781

Alternatively, you could use aggregate for each function (mean, min, and max) and use merge on those results, two data frames at a time.

Upvotes: 3

Related Questions