Reputation: 45
I'm wondering if there is an easy way to average over the previous 30 seconds of data in R when there may be more than one data point per second.
For instance, for the sample weight taken at 32 seconds, I want the mean of the concentrations recorded in the past 30 seconds, so the mean of 9, 10, 7, ..14,20, 18, 2)
. For the sample weight taken at 31 seconds,I want the mean of the concentrations recorded in the past 30 seconds, so the mean of 5, 9, 10, 7, .. 14,20, 18)
. It's technically not a rolling average over the 30 previous measurements because there can be more than one measurement per second.
I'd like to do this in R.
Upvotes: 2
Views: 535
Reputation: 6345
This is sufficiently different that it warrants another answer.
This should do what you're asking with no extra libraries needed.
It just loops through each row, filters based on that row's time, and computes the mean.
Don't fear a simple loop :)
count = 200 # dataset rows
windowTimespan = 30 # timespan of window
# first lets make some data
df = data.frame(
# 200 random numbers from 0-99
time = sort(floor(runif(count)*100)),
concentration = runif(count),
weight = runif(count)
)
# add placeholder column(s)
df$rollingMeanWeight = NA
df$rollingMeanConcentration = NA
# for each row
for (r in 1:nrow(df)) {
# get the time in this row
thisTime = df$time[r]
# find all the rows within the acceptable timespan
# note: figure out if you want < vs <=
thisSubset = df[
df$time < thisTime &
df$time >= thisTime-windowTimespan
,]
# get the mean of the subset
df$rollingMeanWeight[r] = mean(thisSubset$weight)
df$rollingMeanConcentration[r] = mean(thisSubset$concentration)
}
Upvotes: 0
Reputation: 269694
1) sqldf Using DF
below and 3 seconds join the last three seconds of data to each row of DF
and then take the mean over them:
DF <- data.frame(time = c(1, 2, 2, 3, 4, 5, 6, 7, 8, 10), data = 1:10)
library(sqldf)
sqldf("select a.*, avg(b.data) mean
from DF a join DF b on b.time between a.time - 3 and a.time
group by a.rowid")
giving:
time data mean
1 1 1 1.0
2 2 2 2.0
3 2 3 2.0
4 3 4 2.5
5 4 5 3.0
6 5 6 4.0
7 6 7 5.5
8 7 8 6.5
9 8 9 7.5
10 10 10 9.0
The first mean value is the mean(1) which is 1, the second and third mean values are mean(1:3) which is 2, the fourth mean value is mean(1:4) which is 2.5, the fifth mean value is mean(1:5) which is 3, the sixth mean value is mean(2:6) which is 4, the seventh mean value is mean(3:7) which is 5 and so on.
2) This 2nd solution uses no packages. For each row of DF
it finds the rows within 3 seconds back and takes the mean of their data:
Mean3 <- function(i) with(DF, mean(data[time <= time[i] & time >= time[i] - 3]))
cbind(DF, mean = sapply(1:nrow(DF), Mean3))
Upvotes: 1
Reputation: 4732
My first idea would be to summarise the data so the value column would contain a list of all values.
test.data <- data.frame(t = 1:50 + rbinom(50, 30, 0.3), y=rnorm(50)) %>% arrange(t)
prep <- test.data %>% group_by(t) %>% summarise(vals = list(y))
wrk <- left_join(data.frame(t=1:max(test.data$t)), prep, by='t')
Unfortunately zoos rollapply
would not work on such a data.frame.
For testing I was thinking to only use a window of 5 lines.
I tried commands along: rollapply(wrk, 5, function(z) mean(unlist(z)))
But maybe someone else can fill in the missing bit of information.
Upvotes: 0
Reputation: 636
You can do (assuming your data is stored in a dataframe called df
):
now <- 32
step <- 30
subsetData <- subset(df, time >= (now-step) & time < now)
average <- mean(subsetData$concentration)
And if you want to calculate the mean for at more time points, you can put this in a loop where you must adjust now
Upvotes: 0
Reputation: 6345
The rollapply
function should do the trick.
library(zoo)
rollapply(weight.vector, 30, mean)
Upvotes: 0