R- Create a variable that is conditional upon row values

Question

I have a dataframe test that looks like this:

ID  Month1  Month2  Month3  Month4  Month5  Month6  Month7  Month8
    A   0       1      1       1       1        0       0        0
    B   0       1      1       1       0        0      NA        NA
    C   0       0      0       0       NA      NA      NA        NA
    D   0       1      1       1       1        1       1        1
    E   0       1      1       0       0        0       1        1
    F   0       0      0       1       1        1       1        1
    G   0       0      1       1       0        0       0        0
    H   0       0      0       0       0        0       0        0

I want to create a conditional variable “Result” within the data frame that counts the number of times 1 is repeated in a row. However, I only want to count the first set of 1s that is followed by a 0, which means that there are three conditions that I need to keep in mind when crating the “Result” variable. The conditions are as follows:

Condition 1: Count the first set of 1s that end with a 0.

Condition 2: If the set of 1s does not end in 0 or ends in NA, Result=NA

Condition 3 If there are two sets of 1s that end in the zero, ignore the second one.

ID  Month1  Month2  Month3  Month4  Month5  Month6  Month7  Month8  Result 
    A   0       1      1       1       1        0       0        0       4
    B   0       1      1       1       0        0      NA        NA      3
    C   0       0      0       0       NA      NA      NA        NA     0
    D   0       1      1       1       1        1       1        1      NA
    E   0       1      1       0       0        0       1        1      2
    F   0       0      0       1       1        1       1        1      NA
    G   0       0      1       1       0        0       0        0      2
    H   0       0      0       0       0        0       0        0      0

This that I have tried so far:

test$Result=apply(test[,c(2:9)],1,function(x) length(x[x==1&!(is.na(x))]))

this counts the total number of 1 in the row and not just the first set of ones.

Any and all help would be much appreciated.

MrFlick · Accepted Answer

I think using rle() to calculate `lengths of runs of values would be helpful here. Here's how i'm assiming your data is stored in R

dd <- structure(list(ID = structure(1:8, .Label = c("A", "B", "C", 
    "D", "E", "F", "G", "H"), class = "factor"), Month1 = c(0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L), Month2 = c(1L, 1L, 0L, 1L, 1L, 0L, 0L, 
    0L), Month3 = c(1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L), Month4 = c(1L, 
    1L, 0L, 1L, 0L, 1L, 1L, 0L), Month5 = c(1L, 0L, NA, 1L, 0L, 1L, 
    0L, 0L), Month6 = c(0L, 0L, NA, 1L, 0L, 1L, 0L, 0L), Month7 = c(0L, 
    NA, NA, 1L, 1L, 1L, 0L, 0L), Month8 = c(0L, NA, NA, 1L, 1L, 1L, 
    0L, 0L)), .Names = c("ID", "Month1", "Month2", "Month3", "Month4", 
    "Month5", "Month6", "Month7", "Month8"), class = "data.frame", 
    row.names = c(NA, -8L))

Then I would run code like

dd$Result <- apply(dd[, -1], 1, function(x) {
    rr<-rle(x)
    f1<-which(rr$values==1)[1]
    if ( is.na(f1) ) {
        0
    } else if (!is.na(rr$values[f1+1]) && rr$values[f1+1]==0) {
        rr$lengths[f1]
    } else {
        NA
    }
})

which gives

  ID Month1 Month2 Month3 Month4 Month5 Month6 Month7 Month8 Result
1  A      0      1      1      1      1      0      0      0      4
2  B      0      1      1      1      0      0     NA     NA      3
3  C      0      0      0      0     NA     NA     NA     NA      0
4  D      0      1      1      1      1      1      1      1     NA
5  E      0      1      1      0      0      0      1      1      2
6  F      0      0      0      1      1      1      1      1     NA
7  G      0      0      1      1      0      0      0      0      2
8  H      0      0      0      0      0      0      0      0      0

as you desired.

You do have a few edges cases that made this code somewhat tricky. First we have to see of there are any 1's; if not, we return 0. Then if there is a one, we want to make sure it's followed by a 0, otherwise return NA. We perform this logic for each row with an apply(), leaving off the ID column. It's not entirely clear from your description what should happen with NA values if they appear before the end of the row (like at the beginning) so i'll just assume that's a case you don't have to worry about.

R- Create a variable that is conditional upon row values

Answers (2)

Related Questions