Reputation: 3805
Some sample data first
yr1 <- sample(0:1, 365, replace = T)
yr2 <- sample(0:1, 365, replace = T)
yr3 <- sample(0:1, 365, replace = T)
yr4 <- sample(0:1, 365, replace = T)
value <- c(yr1, yr2, yr3, yr4)
yr <- rep(2000:2003, each = 365)
doy <- rep(1:365, times = 4)
foo <- as.data.frame(cbind(value, yr, doy))
foo
contains 3 columns. Column 1 has arbitary value which is either 1 or 0. Column 2 contains year and column 3 has day of the year (365 days)
I have two vectors with start and end days in Julian days
start <- c(258, 258,258,258)
mid <- c(279, 281,285,288)
end <- c(286, 295,300,320)
range.val <- as.data.frame(cbind(start, mid, end))
range.val$yr<- c(2000, 2001, 2002, 2003)
range.val
gives me the julian days between which I have to sum the values for each year in foo
.
For example, for 2000, I need to sum foo$value
starting from 258 day till 279 day and then from 279 till 286. Similarly,
for 2001, sum foo$value
from 258 till 281 and then from 281 till 295.
I also need to calculate length of the longest continous occurrence of 1 between these indices for each year.
I did this:
for(yr in 2000:2003){
range.sub <- range.val[range.val$yr == yr,]
foo.sub <- foo[foo$yr == yr,]
sum.1 <- sum(foo.sub[range.sub$start:range.sub$mid,"value"])
sum.2 <- sum(foo.sub[range.sub$mid:range.sub$end,"value"])
length.1 <- rle(foo.sub[range.sub$start:range.sub$mid,"value"])
max.spell.length <- max(sort(length.1$lengths, , decreasing = TRUE))
length.1 <- rle(foo.sub[range.sub$mid:range.sub$start,"value"])
max.spell.length1 <- max(sort(length.1$lengths, , decreasing = TRUE))
}
In my continous effort to minimise the use of for-loop, I wonder if I can shorten the above code using some other function.
Upvotes: 0
Views: 62
Reputation: 29075
Here's a solution using dplyr.
Create a joint data frame & indicate whether each yr-doy combination is in range 1 (start to mid), range 2 (mid to end), or neither.
library(dplyr)
df <- left_join(foo, range.val, by = "yr")
df <- df %>%
mutate(in.range1 = doy >= start & doy <= mid,
in.range2 = doy >= mid & doy <= end)
# Note: I'm not sure if the ranges are supposed to be inclusive on both ends, but you
# should be able to change that easily
For total value in range X for each year, filter for range & summarise by year:
df.sum.1 <- df %>%
filter(in.range1) %>% #change to in.range2 for mid-end
group_by(yr) %>%
summarise(value = sum(value))
> df.sum.1
# A tibble: 4 x 2
yr value
<dbl> <int>
1 2000 12
2 2001 12
3 2002 10
4 2003 10
For longest run of 1's, filter for range & do rle
on values for each year. Note that we should filter for value == 1 first, else if there's a longer run of 0's, you may get that instead:
df.spell.length1 <- df %>%
filter(in.range1) %>% #change to in.range2 for mid-end
group_by(yr) %>%
arrange(doy) %>%
do(data.frame(unclass(rle(.$value)))) %>%
filter(values == 1) %>%
filter(lengths == max(lengths)) %>%
unique()
> df.spell.length1
# A tibble: 4 x 3
# Groups: yr [4]
yr lengths values
<dbl> <int> <int>
1 2000 7 1
2 2001 3 1
3 2002 3 1
4 2003 3 1
(For reproducibility, the sample data was generated with set.seed(123)
.)
Upvotes: 1