Reputation: 378

Calculate run length aggregated by subject ID conditional on observation == 1

I am trying to use the rle function in R to calculate the run lengths for the variable positive in the example below, aggregated by the variable id.

Here is a toy dataset (that admittedly has a few quirks):

test <- c('id', 'positive')
test$id <- rep(1:3, c(24, 24, 24))
set.seed(123456)          
test$positive <- round(runif(72, 0, 1))

test <- data.frame(test)
test <- subset(test, select = -X.id.)
test <- subset(test, select = -X.positive.)    

result <- aggregate(positive ~ id, data = test, FUN = rle)

The way this currently is set up it reads the run lengths for all possible values (0 and 1) of the variable positive. Is it possible to condition this function such that it only evaluates the run lengths when positive == 1?

At the end of the day, I ultimately want to figure out how to count the number of instances in which two or more consecutive months were positive (positive == 1) for each subject.

UPDATE:

I have a variable called event that has values of 0 or 1. For each of the occurrences of two or more positives that were developed from the code featured in the suggestions below, is it possible to stratify our results such that if event == 1 occurs during any of the positive months it would be classified differently than a run of positives in which event == 0 for all of the months?

The toy dataset looks like this:

set.seed(123456)
x <- c(1, 2, 1)
test <- data.frame(id = rep(1:3, each = 24), positive = round(runif(72, 0, 1)), event = round(runif(72, 0, 1)))

results <- aggregate(positive ~ id + event, data = test, FUN=function(x) with(rle(x),   sum(lengths > 1 & values == 1)))
aggregate(positive ~ event, data = result, FUN=sum)

However, this code gives all possible permutations of event and positive, while I would like to delimit the results to counting only those occurrences of two or more consecutive positive months for which any event == 1. Alternatively, if it is easier to evaluate only the number of consecutive positive months for which all event == 0 that would be a fine solution too.

Upvotes: 2

Answers (3)

Ferdinand.kraft

Reputation: 12819

To count occurrences of two or more consecutive positives, use this:

aggregate(positive ~ id, data=test, FUN=function(x) with(rle(x), sum(lengths>=2 & values==1)))

(inspired in @sgibb's answer.)

EDIT: Counting the number of 2 or more consecutive positives such that any of them has event==1, separated by id:

Calculate the run to which each record belongs:

tmp <- within(test, run <- ave(positive, by=id, FUN=function(x)cumsum(c(1,diff(x)!=0))))

# id positive event run
#  1        1     1   1
#  1        1     0   1
#  1        0     1   2
#  1        0     0   2
#  1        0     1   2
#  1        0     0   2

For each id and each run mark if there was at least one record with event==1 and run length >= 2:

tmp2 <- aggregate(event~id+positive+run, data=tmp, function(x)any(x>0) && length(x)>=2)

# id positive run event
#  2        0   1 FALSE
#  1        1   1  TRUE
#  3        1   1 FALSE
#  1        0   2  TRUE
#  3        0   2  TRUE
#  2        1   2  TRUE

Now simply count how many marked runs are there in each id and each kind of run (positive==1 or positive==0):

aggregate(event~positive+id, tmp2, sum)

# positive id event
#        0  1     1
#        1  1     2
#        0  2     1
#        1  2     3
#        0  3     3
#        1  3     1

Upvotes: 3

Henrik

Reputation: 67788

A ddply version for the 'at the end of the day' part:

library(plyr)
set.seed(123456)
test <- data.frame(id = rep(1:3, each = 24), positive = round(runif(72, 0, 1))) 

ddply(.data = test, .variables = .(id), function(x){
      rl <- rle(x$positive)
      sum(rl$length[rl$value == 1] > 1)
      }
)

#      id V1
#    1  1  2
#    2  2  5
#    3  3  1

Upvotes: 1

sgibb

Reputation: 25736

Do you mean something like this?:

aggregate(positive ~ id, data=test, FUN=function(x) { 
  r <- rle(x); 
  return(r$length[r$value == 1])
})
#   id            positive
# 1  1       2, 1, 1, 7, 1
# 2  2 4, 2, 1, 4, 2, 1, 2
# 3  3       1, 7, 1, 1, 1

Upvotes: 1

Calculate run length aggregated by subject ID conditional on observation == 1

Answers (3)

Related Questions