Reputation: 932
I'm trying to calculate an index of winter severity, and one of the components of the index requires calculating the consecutive number of days < 0 degrees C, prior to and including that particular date. For example:
Day 1 = 2 degrees C
Day 2 = -2 degrees C
Day 3 = -5 degrees C
So the value that I'm trying to calculate (called tempdays) is equal to 0 for Day 1; 1 for Day 2; and 2 for Day 3.
Here's an example showing what the data looks like:
dat <- tibble(
date = seq(as.Date('2010-01-01'), as.Date('2010-01-10'), 1),
temp = c(4.2, 3.35, -0.6, -0.25, 0.8, 0.8, -2.5, -5.25, -0.5, 3.35)
)
dat
date temp
<date> <dbl>
1 2010-01-01 4.2
2 2010-01-02 3.35
3 2010-01-03 -0.6
4 2010-01-04 -0.25
5 2010-01-05 0.8
6 2010-01-06 0.8
7 2010-01-07 -2.5
8 2010-01-08 -5.25
9 2010-01-09 -0.5
10 2010-01-10 3.35
Here's another data set starting with a temp value less than zero since that seemed to cause an issue:
dat2 <- tibble(
date = seq(as.Date('2010-01-01'), as.Date('2010-01-10'), 1),
temp = c(-1.95, -1.1, -2.8, -6.7, 1.4, 4.45, 6.1, 4.7, -1.7, -3.9)
)
dat2
So dat2 should look like this:
date temp tempdays
<date> <dbl> <dbl>
1 2010-01-01 -1.95 1
2 2010-01-02 -1.1 2
3 2010-01-03 -2.8 3
4 2010-01-04 -6.7 4
5 2010-01-05 1.4 0
6 2010-01-06 4.45 0
7 2010-01-07 6.1 0
8 2010-01-08 4.7 0
9 2010-01-09 -1.7 1
10 2010-01-10 -3.9 2
I'm guessing lag()
can be used to do this?
Upvotes: 3
Views: 589
Reputation: 388817
You could create a grouping variable using cumsum
and then use row_number
to generate the consecutive days when the temperature was less than 0.
library(dplyr)
dat %>%
group_by(group = cumsum(temp > 0)) %>%
mutate(tempdays = row_number() - 1) %>%
ungroup() %>%
select(-group)
# date temp tempdays
# <date> <dbl> <dbl>
# 1 2010-01-01 4.2 0
# 2 2010-01-02 3.35 0
# 3 2010-01-03 -0.6 1
# 4 2010-01-04 -0.25 2
# 5 2010-01-05 0.8 0
# 6 2010-01-06 0.8 0
# 7 2010-01-07 -2.5 1
# 8 2010-01-08 -5.25 2
# 9 2010-01-09 -0.5 3
#10 2010-01-10 3.35 0
and using base R that would be with ave
with(dat, ave(temp, cumsum(temp > 0), FUN = seq_along) - 1)
EDIT
This doesn't work as expected if first group is negative. Here is an updated version using rle
which works with dat
as well as dat2
dat2 %>%
mutate(tempdays = with(rle(temp < 0), rep(values, lengths))) %>%
group_by(group = cumsum(temp > 0)) %>%
mutate(tempdays = cumsum(tempdays)) %>%
ungroup() %>%
select(-group)
# date temp tempdays
# <date> <dbl> <int>
# 1 2010-01-01 -1.95 1
# 2 2010-01-02 -1.1 2
# 3 2010-01-03 -2.8 3
# 4 2010-01-04 -6.7 4
# 5 2010-01-05 1.4 0
# 6 2010-01-06 4.45 0
# 7 2010-01-07 6.1 0
# 8 2010-01-08 4.7 0
# 9 2010-01-09 -1.7 1
#10 2010-01-10 -3.9 2
Upvotes: 4
Reputation: 886948
We can use data.table
library(data.table)
setDT(dat)[, tempdays := seq_len(.N) -1 , cumsum(temp > 0)]
dat
# date temp tempdays
# 1: 2010-01-01 4.20 0
# 2: 2010-01-02 3.35 0
# 3: 2010-01-03 -0.60 1
# 4: 2010-01-04 -0.25 2
# 5: 2010-01-05 0.80 0
# 6: 2010-01-06 0.80 0
# 7: 2010-01-07 -2.50 1
# 8: 2010-01-08 -5.25 2
# 9: 2010-01-09 -0.50 3
#10: 2010-01-10 3.35 0
Upvotes: 1