Dealing with NA and boolean sums in R

Question

I'm trying to create a new variable that involves the sum of products of a numeric value and a logical (indicator), but the final result does not make sense.

DATA

I've generated the following dataset as a minimum reproducible example for the issue I'm having. Each row is an individual and the columns mJSW_BL, mJSW_12, mJSW_24, and mJSW_36 are the measurements at baseline, 12, 24 and 36. The last variable I'm creating, JSNCASE_TP indicates the first time (12, 24 or 36) for which an individual meets the definition of of case (decrease from baseline by 0.7). The calculation of JSNCASE_TP should ignore NA values and can take on the values 0,12,24, or 36.

require(dplyr)

set.seed(1)
N = 10
mJSW_BL <- runif(N,0.1,2)
mJSW_12 <- runif(N,0.1,2)
mJSW_24 <- runif(N,0.1,2)
mJSW_36 <- runif(N,0.1,2)

#Randomly set some values to NA
mJSW_12[sample(N,2)] <- NA
mJSW_36[sample(N,1)] <- NA

#Create dataframe
df <- data.frame(mJSW_BL,mJSW_12,mJSW_24,mJSW_36)

df2 <- df %>%
       #Create variables indicating decrease from BL
       mutate(mJSW_BLto12 = mJSW_BL - mJSW_12,
              mJSW_BLto24 = mJSW_BL - mJSW_24,
              mJSW_BLto36 = mJSW_BL - mJSW_36) %>%
       #JSN case - decrease by 0.7 from BL
       mutate(JSNCASE_12 = (mJSW_BLto12>=0.7),
              JSNCASE_24 = (mJSW_BLto24>=0.7),
              JSNCASE_36 = (mJSW_BLto36>=0.7)) %>%
       #Which timepoint did JSN first occur?
       mutate(JSNCASE_TP = sum(12*JSNCASE_12, 
                               24*(JSNCASE_24 & !JSNCASE_12),
                               36*(JSNCASE_36 & !(JSNCASE_12 | JSNCASE_24)),
                               na.rm=TRUE))

ISSUES

In data df2, take for example, row 4, where JSNCASE_12, JSNCASE_24, and JSNCASE_36 are all TRUE, but JSNCASE_TP=36 . It should be JSNCASE_TP=12. Additionally, take row 6, where JSNCASE_12=NA, JSNCASE_24=TRUE, and JSNCASE_36=FALSE. I should get JSNCASE_TP=24. Maybe I'm missing something basic, but I've tried several ways and haven't produced the desired result. The values of JSNCASE_TP for the 10 rows should be 0,0,0,12,0,24,24,0,0,0.

EDIT Thanks to @Dave2e's comments, the code below works:

df2 <- df %>%
   #Create variables indicating decrease from BL
   mutate(mJSW_BLto12 = mJSW_BL - mJSW_12,
          mJSW_BLto24 = mJSW_BL - mJSW_24,
          mJSW_BLto36 = mJSW_BL - mJSW_36) %>%
   #JSN case - decrease by 0.7 from BL
   mutate(JSNCASE_12 = (mJSW_BLto12>=0.7),
          JSNCASE_24 = (mJSW_BLto24>=0.7),
          JSNCASE_36 = (mJSW_BLto36>=0.7)) %>%
   rowwise() %>%
   #Which timepoint did JSN first occur?
   mutate(JSNCASE_TP = sum(12*JSNCASE_12, 
                           24*(JSNCASE_24 & (!JSNCASE_12| is.na(JSNCASE_12))),
                           36*(JSNCASE_36 & ((!JSNCASE_12 | is.na(JSNCASE_12)) & 
                                             (!JSNCASE_24 | is.na(JSNCASE_24)))),
                           na.rm=TRUE))

Dave2e · Accepted Answer

Having the the NA mixed with the TRUE/FALSE does complicates things.

Here is a hack using the apply function. Basically finds the first column with TRUE in it then multiples by 12 to get the proper time. Since it is possible all columns are FALSE, it needs to check and handle cases where inf values are returned by the min function.

df2 <- df %>%
  #Create variables indicating decrease from BL
  mutate(mJSW_BLto12 = mJSW_BL - mJSW_12,
         mJSW_BLto24 = mJSW_BL - mJSW_24,
         mJSW_BLto36 = mJSW_BL - mJSW_36) %>%
  #JSN case - decrease by 0.7 from BL
  mutate(JSNCASE_12 = (mJSW_BLto12>=0.7),
         JSNCASE_24 = (mJSW_BLto24>=0.7),
         JSNCASE_36 = (mJSW_BLto36>=0.7))


df2$JSNCASE_TP<-12*apply(df2[,8:10], 1, function(x){ ifelse(is.infinite(min(which(x==TRUE))), 0, min(which(x==TRUE)) )})

I'm sure there is a possible dplyr version of this.

Dealing with NA and boolean sums in R

Answers (2)

Related Questions