89_Simple
89_Simple

Reputation: 3805

R: using apply family instead of for-loops

Some sample data first

yr1 <- sample(0:1, 365, replace = T)
yr2 <- sample(0:1, 365, replace = T)
yr3 <- sample(0:1, 365, replace = T)
yr4 <- sample(0:1, 365, replace = T)

value <- c(yr1, yr2, yr3, yr4)

yr <- rep(2000:2003, each = 365)
doy <- rep(1:365, times = 4)

foo <- as.data.frame(cbind(value, yr, doy))

foo contains 3 columns. Column 1 has arbitary value which is either 1 or 0. Column 2 contains year and column 3 has day of the year (365 days)

I have two vectors with start and end days in Julian days

start <- c(258, 258,258,258)
mid <- c(279, 281,285,288)
end <- c(286, 295,300,320)

range.val <- as.data.frame(cbind(start, mid, end))
range.val$yr<- c(2000, 2001, 2002, 2003)

range.val gives me the julian days between which I have to sum the values for each year in foo.

For example, for 2000, I need to sum foo$value starting from 258 day till 279 day and then from 279 till 286. Similarly, for 2001, sum foo$value from 258 till 281 and then from 281 till 295.

I also need to calculate length of the longest continous occurrence of 1 between these indices for each year.

I did this:

for(yr in 2000:2003){

    range.sub <- range.val[range.val$yr == yr,]
    foo.sub <- foo[foo$yr == yr,]

    sum.1 <- sum(foo.sub[range.sub$start:range.sub$mid,"value"])
    sum.2 <- sum(foo.sub[range.sub$mid:range.sub$end,"value"])

    length.1 <- rle(foo.sub[range.sub$start:range.sub$mid,"value"]) 
    max.spell.length <-  max(sort(length.1$lengths, , decreasing = TRUE))

    length.1 <- rle(foo.sub[range.sub$mid:range.sub$start,"value"]) 
    max.spell.length1 <-  max(sort(length.1$lengths, , decreasing = TRUE))
}

In my continous effort to minimise the use of for-loop, I wonder if I can shorten the above code using some other function.

Upvotes: 0

Views: 62

Answers (1)

Z.Lin
Z.Lin

Reputation: 29075

Here's a solution using dplyr.

Create a joint data frame & indicate whether each yr-doy combination is in range 1 (start to mid), range 2 (mid to end), or neither.

library(dplyr)

df <- left_join(foo, range.val, by = "yr")
df <- df %>%
  mutate(in.range1 = doy >= start & doy <= mid,
         in.range2 = doy >= mid & doy <= end)
# Note: I'm not sure if the ranges are supposed to be inclusive on both ends, but you
# should be able to change that easily

For total value in range X for each year, filter for range & summarise by year:

df.sum.1 <- df %>%
  filter(in.range1) %>% #change to in.range2 for mid-end
  group_by(yr) %>%
  summarise(value = sum(value))

> df.sum.1
# A tibble: 4 x 2
     yr value
  <dbl> <int>
1  2000    12
2  2001    12
3  2002    10
4  2003    10

For longest run of 1's, filter for range & do rle on values for each year. Note that we should filter for value == 1 first, else if there's a longer run of 0's, you may get that instead:

df.spell.length1 <- df %>%
  filter(in.range1) %>% #change to in.range2 for mid-end
  group_by(yr) %>%
  arrange(doy) %>%
  do(data.frame(unclass(rle(.$value)))) %>%
  filter(values == 1) %>%
  filter(lengths == max(lengths)) %>%
  unique()

> df.spell.length1
# A tibble: 4 x 3
# Groups: yr [4]
     yr lengths values
  <dbl>   <int>  <int>
1  2000       7      1
2  2001       3      1
3  2002       3      1
4  2003       3      1

(For reproducibility, the sample data was generated with set.seed(123).)

Upvotes: 1

Related Questions