Reputation: 378
I am trying to use the rle
function in R to calculate the run lengths for the variable positive
in the example below, aggregated by the variable id
.
Here is a toy dataset (that admittedly has a few quirks):
test <- c('id', 'positive')
test$id <- rep(1:3, c(24, 24, 24))
set.seed(123456)
test$positive <- round(runif(72, 0, 1))
test <- data.frame(test)
test <- subset(test, select = -X.id.)
test <- subset(test, select = -X.positive.)
result <- aggregate(positive ~ id, data = test, FUN = rle)
The way this currently is set up it reads the run lengths for all possible values (0 and 1) of the variable positive
. Is it possible to condition this function such that it only evaluates the run lengths when positive == 1
?
At the end of the day, I ultimately want to figure out how to count the number of instances in which two or more consecutive months were positive (positive == 1
) for each subject.
UPDATE:
I have a variable called event
that has values of 0 or 1. For each of the occurrences of two or more positives that were developed from the code featured in the suggestions below, is it possible to stratify our results such that if event == 1
occurs during any of the positive months it would be classified differently than a run of positives in which event == 0
for all of the months?
The toy dataset looks like this:
set.seed(123456)
x <- c(1, 2, 1)
test <- data.frame(id = rep(1:3, each = 24), positive = round(runif(72, 0, 1)), event = round(runif(72, 0, 1)))
results <- aggregate(positive ~ id + event, data = test, FUN=function(x) with(rle(x), sum(lengths > 1 & values == 1)))
aggregate(positive ~ event, data = result, FUN=sum)
However, this code gives all possible permutations of event and positive, while I would like to delimit the results to counting only those occurrences of two or more consecutive positive months for which any event == 1
. Alternatively, if it is easier to evaluate only the number of consecutive positive months for which all event == 0
that would be a fine solution too.
Upvotes: 2
Views: 913
Reputation: 12819
To count occurrences of two or more consecutive positives, use this:
aggregate(positive ~ id, data=test, FUN=function(x) with(rle(x), sum(lengths>=2 & values==1)))
(inspired in @sgibb's answer.)
EDIT: Counting the number of 2 or more consecutive positives such that any of them has event==1, separated by id:
Calculate the run to which each record belongs:
tmp <- within(test, run <- ave(positive, by=id, FUN=function(x)cumsum(c(1,diff(x)!=0))))
# id positive event run
# 1 1 1 1
# 1 1 0 1
# 1 0 1 2
# 1 0 0 2
# 1 0 1 2
# 1 0 0 2
For each id and each run mark if there was at least one record with event==1
and run length >= 2:
tmp2 <- aggregate(event~id+positive+run, data=tmp, function(x)any(x>0) && length(x)>=2)
# id positive run event
# 2 0 1 FALSE
# 1 1 1 TRUE
# 3 1 1 FALSE
# 1 0 2 TRUE
# 3 0 2 TRUE
# 2 1 2 TRUE
Now simply count how many marked runs are there in each id and each kind of run (positive==1
or positive==0
):
aggregate(event~positive+id, tmp2, sum)
# positive id event
# 0 1 1
# 1 1 2
# 0 2 1
# 1 2 3
# 0 3 3
# 1 3 1
Upvotes: 3
Reputation: 67788
A ddply
version for the 'at the end of the day' part:
library(plyr)
set.seed(123456)
test <- data.frame(id = rep(1:3, each = 24), positive = round(runif(72, 0, 1)))
ddply(.data = test, .variables = .(id), function(x){
rl <- rle(x$positive)
sum(rl$length[rl$value == 1] > 1)
}
)
# id V1
# 1 1 2
# 2 2 5
# 3 3 1
Upvotes: 1
Reputation: 25736
Do you mean something like this?:
aggregate(positive ~ id, data=test, FUN=function(x) {
r <- rle(x);
return(r$length[r$value == 1])
})
# id positive
# 1 1 2, 1, 1, 7, 1
# 2 2 4, 2, 1, 4, 2, 1, 2
# 3 3 1, 7, 1, 1, 1
Upvotes: 1