Sarah Grogan
Sarah Grogan

Reputation: 137

Function or loop to subset moving average in R

I'm new to loops and functions in R.

Imagine I have measurements at every 0.1 units from 1.0 to 3.5 for four samples (A, B, C, D).

I want to find the average measurements (+/- 0.2 units) near 1.5, 2.5, and 3.5. So, for 1.5 I'm averaging the values at c(1.3, 1.4, 1.5, 1.6, and 1.7), etc.

How can I write a statement to summarize those three average values for all four samples? I think it might start something like this:

X <- (1.5, 2.5, 3.5)

for (i in X)

{
  avg <- colMeans(subset(data,data$measurement > (i - 0.2) & data$measurement < (i + 0.2)))   
}

I've also considered using '[' instead too:

colMeans(data[data$measurement > (i-0.2) & data$measurement < (i+0.2)]) 

Thanks for the help so far, sqldf is a really nice tool, the example does just what I want!

However, I can't get it to work with the real data set. I modified the code so it looks like (sorry, this doesn't correspond with the sample data set anymore):

M <- sqldf("select r.i,avg(w.X1),avg(w.X2),avg(w.X3),avg(w.X4)
           from Y r, Y w
           where w.i betreen r.i - 1 and r.i + 1
           group by r.i
           having r.i+0.0 in (600, 700, 800)")

To contextualize it, I am trying to summarize the average of all points from 599–601, 699–701 and 799–801, for four columns named X1, X2, X3, X4. I named this data frame 'Y'. The rows are actually wavelengths, and the data points the amount of light reflected at that wavelength.

Do you see anything wrong with the above code? -- It creates a matrix with the right dimensions, but the averages don't match with what they should from the larger dataset. I'm wondering if I'm not understanding something in the code, for instance, the importance of the 'w' variable.

Upvotes: 1

Views: 1106

Answers (1)

nh2
nh2

Reputation: 620

Proper indexing is faster than the loop.

library(zoo)
set.seed(1)
x <- as.character(seq(1,3.5,.1)) 
z <- zoo(data.frame(a=rnorm(length(x)),
                    b=rnorm(length(x)),
                    c=rnorm(length(x))),
         x)
z2 <- rollmean(z, k = 5, align = "center")[as.character(seq(1,3.5,.5)),]
> z2
              a           b          c
1.5  0.46601479  0.40153999  0.2007418
2    0.31015536 -0.22912642  0.4673692
2.5 -0.04141133  0.31978341  0.4350507
3    0.63816023 -0.07509644 -0.3622883

> data.frame(z2, index = index(z2))
              a           b          c index
1.5  0.46601479  0.40153999  0.2007418   1.5
2    0.31015536 -0.22912642  0.4673692     2
2.5 -0.04141133  0.31978341  0.4350507   2.5
3    0.63816023 -0.07509644 -0.3622883     3

If you want the partial fills on the edges where the window is less than 5 wide:

> rollapply(z, width = 5, align = "center", partial = TRUE, FUN = mean)[as.character(seq(1,3.5,.5)),]
              a           b           c
1   -0.42614637 -0.70156598  0.21492677
1.5  0.46601479  0.40153999  0.20074176 
2    0.31015536 -0.22912642  0.46736921 
2.5 -0.04141133  0.31978341  0.43505071
3    0.63816023 -0.07509644 -0.36228832
3.5 -0.47521823  0.22239574 -0.05024676

If the windows sizes are irregular, but equally spaced as mentioned in the comment:

> z2 <- as.data.frame(z)
> z2$i <- row.names(z2)
> library(sqldf)
> sqldf("select a.i,avg(b.a),avg(b.b),avg(b.c) 
         from z2 a, z2 b 
         where b.i between a.i - .21 and a.i + .21 
         group by a.i 
         having a.i+0.0 in (1.5,2.0,2.5,3.0,3.5)")
    i    avg(b.a)    avg(b.b)    avg(b.c)
1 1.5  0.46601479  0.40153999  0.20074176
2   2  0.31015536 -0.22912642  0.46736921
3 2.5 -0.04141133  0.31978341  0.43505071
4   3  0.63816023 -0.07509644 -0.36228832
5 3.5 -0.47521823  0.22239574 -0.05024676

Upvotes: 2

Related Questions