Reputation: 636
I'm trying to create a new variable that involves the sum of products of a numeric value and a logical (indicator), but the final result does not make sense.
DATA
I've generated the following dataset as a minimum reproducible example for the issue I'm having. Each row is an individual and the columns mJSW_BL
, mJSW_12
, mJSW_24
, and mJSW_36
are the measurements at baseline, 12, 24 and 36. The last variable I'm creating, JSNCASE_TP
indicates the first time (12, 24 or 36) for which an individual meets the definition of of case (decrease from baseline by 0.7). The calculation of JSNCASE_TP
should ignore NA values and can take on the values 0,12,24, or 36.
require(dplyr)
set.seed(1)
N = 10
mJSW_BL <- runif(N,0.1,2)
mJSW_12 <- runif(N,0.1,2)
mJSW_24 <- runif(N,0.1,2)
mJSW_36 <- runif(N,0.1,2)
#Randomly set some values to NA
mJSW_12[sample(N,2)] <- NA
mJSW_36[sample(N,1)] <- NA
#Create dataframe
df <- data.frame(mJSW_BL,mJSW_12,mJSW_24,mJSW_36)
df2 <- df %>%
#Create variables indicating decrease from BL
mutate(mJSW_BLto12 = mJSW_BL - mJSW_12,
mJSW_BLto24 = mJSW_BL - mJSW_24,
mJSW_BLto36 = mJSW_BL - mJSW_36) %>%
#JSN case - decrease by 0.7 from BL
mutate(JSNCASE_12 = (mJSW_BLto12>=0.7),
JSNCASE_24 = (mJSW_BLto24>=0.7),
JSNCASE_36 = (mJSW_BLto36>=0.7)) %>%
#Which timepoint did JSN first occur?
mutate(JSNCASE_TP = sum(12*JSNCASE_12,
24*(JSNCASE_24 & !JSNCASE_12),
36*(JSNCASE_36 & !(JSNCASE_12 | JSNCASE_24)),
na.rm=TRUE))
ISSUES
In data df2
, take for example, row 4, where JSNCASE_12
, JSNCASE_24
, and JSNCASE_36
are all TRUE
, but JSNCASE_TP=36
. It should be JSNCASE_TP=12
. Additionally, take row 6, where JSNCASE_12=NA
, JSNCASE_24=TRUE
, and JSNCASE_36=FALSE
. I should get JSNCASE_TP=24
. Maybe I'm missing something basic, but I've tried several ways and haven't produced the desired result. The values of JSNCASE_TP
for the 10 rows should be 0,0,0,12,0,24,24,0,0,0
.
EDIT Thanks to @Dave2e's comments, the code below works:
df2 <- df %>%
#Create variables indicating decrease from BL
mutate(mJSW_BLto12 = mJSW_BL - mJSW_12,
mJSW_BLto24 = mJSW_BL - mJSW_24,
mJSW_BLto36 = mJSW_BL - mJSW_36) %>%
#JSN case - decrease by 0.7 from BL
mutate(JSNCASE_12 = (mJSW_BLto12>=0.7),
JSNCASE_24 = (mJSW_BLto24>=0.7),
JSNCASE_36 = (mJSW_BLto36>=0.7)) %>%
rowwise() %>%
#Which timepoint did JSN first occur?
mutate(JSNCASE_TP = sum(12*JSNCASE_12,
24*(JSNCASE_24 & (!JSNCASE_12| is.na(JSNCASE_12))),
36*(JSNCASE_36 & ((!JSNCASE_12 | is.na(JSNCASE_12)) &
(!JSNCASE_24 | is.na(JSNCASE_24)))),
na.rm=TRUE))
Upvotes: 1
Views: 371
Reputation: 24089
Having the the NA mixed with the TRUE/FALSE does complicates things.
Here is a hack using the apply
function. Basically finds the first column with TRUE in it then multiples by 12 to get the proper time. Since it is possible all columns are FALSE, it needs to check and handle cases where inf values are returned by the min
function.
df2 <- df %>%
#Create variables indicating decrease from BL
mutate(mJSW_BLto12 = mJSW_BL - mJSW_12,
mJSW_BLto24 = mJSW_BL - mJSW_24,
mJSW_BLto36 = mJSW_BL - mJSW_36) %>%
#JSN case - decrease by 0.7 from BL
mutate(JSNCASE_12 = (mJSW_BLto12>=0.7),
JSNCASE_24 = (mJSW_BLto24>=0.7),
JSNCASE_36 = (mJSW_BLto36>=0.7))
df2$JSNCASE_TP<-12*apply(df2[,8:10], 1, function(x){ ifelse(is.infinite(min(which(x==TRUE))), 0, min(which(x==TRUE)) )})
I'm sure there is a possible dplyr version of this.
Upvotes: 1
Reputation: 13135
We can use mutate_at
and apply
to get the column name if it exists otherwise 0.
library(dplyr)
df %>%
mutate_at(vars(matches('.*_\\d+')), list(ind=~mJSW_BL-.>=0.7)) %>%
mutate(JSNCASE_TP = apply(.[grepl('.*_ind',names(.))], 1, function(x){
x <- x[!is.na(x)]
ifelse(all(!x), 0, names(x)[which.max(x)])
}))
mJSW_BL mJSW_12 mJSW_24 mJSW_36 mJSW_12_ind mJSW_24_ind mJSW_36_ind JSNCASE_TP
1 0.6044665 0.4913517 1.8759399 1.0159522 FALSE FALSE FALSE 0
2 0.8070354 0.4354578 0.5030708 1.2391751 FALSE FALSE FALSE 0
3 1.1884214 1.4053434 1.3381802 1.0377285 FALSE FALSE FALSE 0
4 1.8255948 0.8297971 0.3385547 0.4538134 TRUE TRUE TRUE mJSW_12_ind
5 0.4831957 1.5626987 0.6077193 1.6720093 FALSE FALSE FALSE 0
6 1.8069404 NA 0.8336168 1.3700868 NA TRUE FALSE mJSW_24_ind
7 1.8948830 1.4634752 0.1254416 1.6090557 FALSE TRUE FALSE mJSW_24_ind
8 1.3555158 1.9846216 0.8265371 NA FALSE FALSE NA 0
9 1.2953167 NA 1.7524126 1.4750508 NA FALSE FALSE 0
10 0.2173939 1.5771459 0.7466631 0.8814214 FALSE FALSE FALSE 0
However, it's better to transfer your data into tidy data, do the analysis then transfer back into wide format. Here is one way
library(dplyr)
library(tidyr)
df %>% rownames_to_column(var = 'id') %>%
gather(key,value,-id) %>%
group_by(id) %>%
mutate(Ind = ifelse(is.na(value), FALSE, (first(value)-value)>=0.7), JSNCASE_TP = ifelse(any(Ind),key[Ind],'0')) %>%
select(-Ind) %>% ungroup() %>%
spread(key, value) %>%
arrange(as.numeric(id))
Upvotes: 1