Reputation: 167
I have a data frame which looks like
z<-data.frame(a=c(seq(1990,1995,1), 1997,1998,1999,2001,2002,2003), b=seq(90,101,1))
I use function
rollapply(b, 3, sd, align='right')
to calculate standard deviation.
The thing I want to do is that function breaks and starts to calculate standard deviation again if there is a gap between consecutive years.
EDIT:
My sample output should look like this:
enter code here a b c
1 1990 90 NA
2 1991 91 NA
3 1992 92 sd(90,91,92)
4 1993 93 sd(93,92,91)
5 1994 94 sd(94,93,92)
6 1995 95 sd(95,94,93)
7 1997 96 NA
8 1998 97 NA
9 1999 98 sd(98,97,96)
10 2001 99 NA
11 2002 100 NA
12 2003 101 sd(101,100,99)
Upvotes: 0
Views: 264
Reputation: 269471
Convert data.frame to a zoo object, z
, and merge that with a grid, g
, of all years including the ones not found in z
. Apply rollapplyr
to that and extract out the original times:
library(zoo)
z <- read.zoo(DF, FUN = identity)
g <- merge(z, zoo(, start(z):end(z)))
r <- rollapplyr(g, 3, sd, fill = NA)[I(time(z))]
giving:
> r
1990 1991 1992 1993 1994 1995 1997 1998 1999 2001 2002 2003
NA NA 1 1 1 1 NA NA 1 NA NA 1
r
is a zoo object for which time(r)
is the times and coredata(r)
is the data.
Note: We have used:
DF <- structure(list(V1 = c(1990L, 1991L, 1992L, 1993L, 1994L, 1995L,
1997L, 1998L, 1999L, 2001L, 2002L, 2003L), V2 = 90:101), .Names = c("V1",
"V2"), class = "data.frame", row.names = c(NA, -12L))
Upvotes: 1
Reputation: 52637
I think this does what you want:
my.roll <- function(x) rollapply(x, 3, sd, align='right', fill=NA, na.rm=T)
z$sd <- ave(z$b, c(0, cumsum(diff(z$a) - 1)), FUN=my.roll)
Produces:
a b sd
1 1990 90 NA
2 1991 91 NA
3 1992 92 1
4 1993 93 1
5 1994 94 1
6 1995 95 1
7 1997 96 NA
8 1998 97 NA
9 1999 98 1
10 2001 99 NA
11 2002 100 NA
12 2003 101 1
Note how the first two entries after each gap are NA because you need at least three values in your window.
Basically, what we do here is use cumsum
and diff
to figure out the blocks of contiguous years, and then with that we can use ave
to apply sd
to each block. Note this will break if you have repeated years (e.g. 1997 shows up 2 or more times), or if your data isn't sorted by year.
Upvotes: 2