Reputation: 811
I want to tally the number of times that there are consecutive observations matching a condition. For instance. In foo
below I would like to tally the number of days in the month of March where consecutive values of y
are smaller than one standard deviation from the mean value of y
for that month. My data are laid out like foo
:
library(lubridate)
foo <- data.frame(x=seq.Date(as.Date("1981/1/1"),
as.Date("2000/12/31"), "day"))
foo$y <- arima.sim(n = nrow(foo), list(ar = c(0.8)))
I've figured out how to tally the number of days in March for each year where y
is more than one standard deviation below the mean:
bar <- foo %>% filter(month(x) == 3 & y < mean(y)-sd(y)) %>%
group_by(year(x)) %>% tally()
I would like this count to be only when the the days matching the condition are consecutive. E.g., if the mean temperature for March is 0 and and the sd is 1 and March 5, 6 and 7 in the year 1990 are all below -1 the tally would be 3 for the year 1990. If March 21 was also < -1 but March 20 and 22 are not < -1, the tally would still be 3 because March 21 doesn't have neighbors that are also < -1.
I imagine rle
comes into play but I don't understand how.
Any advice appreciated.
Upvotes: 3
Views: 185
Reputation: 6441
So this should work.
foo %>%
separate(x, sep = "-", into = c("year", "month", "day")) %>%
filter(month == "03") %>%
group_by(year) %>%
mutate(z = y < mean(y)-sd(y),
g = {r <- rle(z)
r$values[r$lengths < 2] <- FALSE
inverse.rle(r)}) %>%
tally(g)
# A tibble: 20 x 2
year n
<chr> <int>
1 1981 2
2 1982 6
3 1983 4
4 1984 4
5 1985 3
6 1986 5
7 1987 3
8 1988 7
9 1989 5
10 1990 4
11 1991 7
12 1992 4
13 1993 6
14 1994 5
15 1995 3
16 1996 5
17 1997 5
18 1998 4
19 1999 6
20 2000 6
I have left z
and g
so you can ceck the result.
UPDATE: rle
takes a sequence and creates an object with two elements from it: 1. lengths
- the number how often an element is repeated consecutively in the sequence. 2. values
- the according value.
Take this example:
seq <- c("a", "a", "a", "b", "b", "c")
rle_obj <- rle(seq)
rle_obj
Run Length Encoding
lengths: int [1:3] 3 2 1
values : chr [1:3] "a" "b" "c"
Now you can manipulate the sequence. For example turn "b" into a sequence of 4 instead of 2:
rle_obj$lengths[rle_obj$values == "b"] <- 4
inverse.rle(rle_obj)
[1] "a" "a" "a" "b" "b" "b" "b" "c"
Hope that gave you some insight.
Upvotes: 2