Reputation: 3805
first define some function to bind list rowwise and column wise
# a function to append vectors row wise
rbindlist <- function(list) {
n <- length(list)
res <- NULL
for (i in seq(n)) res <- rbind(res, list[[i]])
return(res)
}
cbindlist <- function(list) {
n <- length(list)
res <- NULL
for (i in seq(n)) res <- cbind(res, list[[i]])
return(res)
}
# generate sample data
sample.dat <- list()
set.seed(123)
for(i in 1:365){
vec1 <- sample(c(0,1), replace=TRUE, size=5)
sample.dat[[i]] <- vec1
}
dat <- rbindlist(sample.dat)
dat
has five columns. Each column is a location and has 365 days of the year (365 rows) with values 1 or 0.
I have another dataframe (see below) which has certain days of the year for each column (location) in dat
.
# generate second sample data
set.seed(123)
sample.dat1 <- list()
for(i in 1:5){
vec1 <- sort(sample(c(258:365), replace=TRUE, size=4), decreasing = F)
sample.dat1[[i]] <- vec1
}
dat1 <- cbindlist(sample.dat1)
I need to use dat1
to subset days in dat
to do a calculation. An example below:
1) For location 1 (first column in both dat1
and dat
):
In column 1 of dat
, select the days from 289 till 302 (using dat1
), find the longest consecutive occurrence of 1.
Repeat it and this time select the days from 303 (302 + 1) till 343 from dat
, find the longest consecutive occurrence of 1.
Repeat it for 343 till 353: select the days from 344 (343 + 1) till 353, find the longest consecutive occurrence of 1.
2) Do this for all the columns
If I want to do sum of 1s, I can do this:
dat <- as.tibble(dat)
dat1 <- as.tibble(dat1)
pmap(list(dat,dat1), ~ {
range1 <- ..2[1]
range2 <- ..2[2]
range3 <- ..2[3]
range4 <- ..2[4]
sum.range1 <- sum(..1[range1:range2]) # this will generate sum between range 1 and range 2
sum.range2 <- sum(..1[range2:range3]) # this will generate sum between range 2 and range 3
sum.range3 <- sum(..1[range3:range4]) # this will generate sum between range 3 and range 4
c(sum.range1=sum.range1,sum.range2=sum.range2,sum.range3=sum.range3)
})
For longest consequtive occurrence of 1 between each range, I thought of using the rle
function. Example below:
pmap(list(dat,dat1), ~ {
range1 <- ..2[1]
range2 <- ..2[2]
range3 <- ..2[3]
range4 <- ..2[4]
spell.range1 <- rle(..1[range1:range2]) # sort the data, this shows the longest run of ANY type (0 OR 1)
spell.1.range1 <- tapply(spell.range1$lengths, spell.range1$values, max)[2] # this should select the maximum consequtive run of 1
spell.range2 <- rle(..1[range2:range3]) # sort the data, this shows the longest run of ANY type (0 OR 1)
spell.1.range2 <- tapply(spell.range2$lengths, spell.range2$values, max)[2] # this should select the maximum consequtive run of 1
spell.range3 <- rle(..1[range3:range4]) # sort the data, this shows the longest run of ANY type (0 OR 1)
spell.1.range3 <- tapply(spell.range3$lengths, spell.range3$values, max)[2] # this should select the maximum consequtive run of 1
c(spell.1.range1 = spell.1.range1, spell.1.range2 = spell.1.range2, spell.1.range3 = spell.1.range3)
})
I get an error which I think is because I am not using the rle
function properly here. I would really like to keep the code as above since
my others code are in the same pattern and format of the outputs is suited for my need, so I would appreciate if someone can suggest how to fix it.
Upvotes: 1
Views: 2402
Reputation: 42544
OP's code does work for me. So, without a specific error message it is impossible to understand why the code is not working for the OP.
However, the sample datasets created by the OP are matrices (before they were coerced to tibble
) and I felt challenged to find a way to solve the task in base R without using purrr
:
To find the number of consecutive occurences of a particular value val
in a vector x
we can use the following function:
max_rle <- function(x, val) {
y <- rle(x)
len <- y$lengths[y$value == val]
if (length(len) > 0) max(len) else NA
}
Examples:
max_rle(c(0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1), 1)
[1] 4
max_rle(c(0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1), 0)
[1] 2
# find consecutive occurrences in column batches
lapply(seq_len(ncol(dat1)), function(col_num) {
start <- head(dat1[, col_num], -1L)
end <- tail(dat1[, col_num], -1L) - 1
sapply(seq_along(start), function(range_num) {
max_rle(dat[start[range_num]:end[range_num], col_num], 1)
})
})
[[1]] [1] 8 4 5 [[2]] [1] 4 5 2 [[3]] [1] NA 3 4 [[4]] [1] 5 5 4 [[5]] [1] 3 2 3
The first lapply()
loops over the columns of dat
and dat1
, resp. The second sapply()
loops over the row ranges stored in dat1
and subsets dat
accordingly.
Upvotes: 3