Reputation: 45

average previous time series in R

I'm wondering if there is an easy way to average over the previous 30 seconds of data in R when there may be more than one data point per second.

For instance, for the sample weight taken at 32 seconds, I want the mean of the concentrations recorded in the past 30 seconds, so the mean of 9, 10, 7, ..14,20, 18, 2). For the sample weight taken at 31 seconds,I want the mean of the concentrations recorded in the past 30 seconds, so the mean of 5, 9, 10, 7, .. 14,20, 18). It's technically not a rolling average over the 30 previous measurements because there can be more than one measurement per second.

I'd like to do this in R.

Upvotes: 2

Answers (5)

sharoz

Reputation: 6345

This is sufficiently different that it warrants another answer.

This should do what you're asking with no extra libraries needed.

It just loops through each row, filters based on that row's time, and computes the mean.

Don't fear a simple loop :)

count = 200 # dataset rows
windowTimespan = 30 # timespan of window

# first lets make some data
df = data.frame(
  # 200 random numbers from 0-99
  time = sort(floor(runif(count)*100)),
  concentration = runif(count),
  weight = runif(count)
)

# add placeholder column(s)
df$rollingMeanWeight = NA
df$rollingMeanConcentration = NA

# for each row
for (r in 1:nrow(df)) {
  # get the time in this row
  thisTime = df$time[r]
  # find all the rows within the acceptable timespan
  # note: figure out if you want < vs <=
  thisSubset = df[
    df$time < thisTime &  
    df$time >= thisTime-windowTimespan
  ,]
  # get the mean of the subset
  df$rollingMeanWeight[r] = mean(thisSubset$weight)
  df$rollingMeanConcentration[r] = mean(thisSubset$concentration)
}

Upvotes: 0

G. Grothendieck

Reputation: 269694

1) sqldf Using DF below and 3 seconds join the last three seconds of data to each row of DF and then take the mean over them:

DF <- data.frame(time = c(1, 2, 2, 3, 4, 5, 6, 7, 8, 10), data = 1:10)

library(sqldf)
sqldf("select a.*, avg(b.data) mean 
       from DF a join DF b on b.time between a.time - 3 and a.time 
       group by a.rowid")

giving:

   time data mean
1     1    1  1.0
2     2    2  2.0
3     2    3  2.0
4     3    4  2.5
5     4    5  3.0
6     5    6  4.0
7     6    7  5.5
8     7    8  6.5
9     8    9  7.5
10   10   10  9.0

The first mean value is the mean(1) which is 1, the second and third mean values are mean(1:3) which is 2, the fourth mean value is mean(1:4) which is 2.5, the fifth mean value is mean(1:5) which is 3, the sixth mean value is mean(2:6) which is 4, the seventh mean value is mean(3:7) which is 5 and so on.

2) This 2nd solution uses no packages. For each row of DF it finds the rows within 3 seconds back and takes the mean of their data:

Mean3 <- function(i) with(DF, mean(data[time <= time[i] & time >= time[i] - 3]))
cbind(DF, mean = sapply(1:nrow(DF), Mean3))

Upvotes: 1

bdecaf

Reputation: 4732

My first idea would be to summarise the data so the value column would contain a list of all values.

test.data <- data.frame(t = 1:50 + rbinom(50, 30, 0.3), y=rnorm(50)) %>% arrange(t)
prep <- test.data %>% group_by(t) %>% summarise(vals = list(y))
wrk <- left_join(data.frame(t=1:max(test.data$t)), prep, by='t')

Unfortunately zoos rollapply would not work on such a data.frame.
For testing I was thinking to only use a window of 5 lines. I tried commands along: rollapply(wrk, 5, function(z) mean(unlist(z)))

But maybe someone else can fill in the missing bit of information.

Upvotes: 0

StatMan

Reputation: 636

You can do (assuming your data is stored in a dataframe called df):

now <- 32
step <- 30
subsetData <- subset(df, time >= (now-step) & time < now)
average <- mean(subsetData$concentration)

And if you want to calculate the mean for at more time points, you can put this in a loop where you must adjust now

Upvotes: 0

sharoz

Reputation: 6345

The rollapply function should do the trick.

library(zoo)
rollapply(weight.vector, 30, mean)

Upvotes: 0

average previous time series in R

Answers (5)

Related Questions