Reputation: 415
I have 5 repeat measures called pub1:pub5
each taking a value of 1 to 4. Each was measured at a different age age1:age5
. That is, pub1
was measured at age1
....pub5
at age5
etc.
I would like to create a new variable age_pb2
that shows the age at which a value of 2 first occurred in pub
. For example, for individual x, age_pb2
will equal age3
if the first time a value of 2 is scored is in pub3
I have tried modifying previous code but not had much luck.
library(tidyverse)
#Example data
N <- 2000
data <- data.frame(id = 1:2000,age1 = rnorm(N,6:8),age2 = rnorm(N,7:9),age3 = rnorm(N,8:10),
age4 = rnorm(N,9:11),age5 = rnorm(N,10:12),pub1 = rnorm(N,1:2),pub2 = rnorm(N,1:2),
pub3 = rnorm(N,1:2),pub4 = rnorm(N,1:2),pub5 = rnorm(N,1:2))
data <- data %>% mutate_at(vars(starts_with("pub")), funs(round(replace(., .< 0, NA), 0)))
#New variable showing first age at getting a score of 2 (doesn't work)
i1 <- grepl('^pub', names(data)) # index for pub columns
i2 <- grepl('^age', names(data)) # index for age columns
data[paste0("age_pb2")] <- lapply(2, function(i) {
j1 <- max.col(data[i1] == i, 'first')
j2 <- rowSums(data[i1] == i) == 0
data[i2][cbind(seq_len(nrow(data)), j1 *(NA^j2))]
})
Upvotes: 0
Views: 561
Reputation: 383
set.seed(1)
N <- 2000
data <- data.frame(id = 1:2000,age1 = rnorm(N,6:8),age2 = rnorm(N,7:9),age3 = rnorm(N,8:10),
age4 = rnorm(N,9:11),age5 = rnorm(N,10:12),pub1 = rnorm(N,1:2),pub2 = rnorm(N,1:2),
pub3 = rnorm(N,1:2),pub4 = rnorm(N,1:2),pub5 = rnorm(N,1:2)) %>%
mutate_at(vars(starts_with("pub")), funs(round(replace(., .< 0, NA), 0))) %>%
mutate(age_pb2 = eval(parse(text = paste0("age", which.min(apply(select(., starts_with("pub")), 2, function(x) which(x == 2)[1]))))))
The way it works, you apply
over the pubs columns and take with which(x == 2)[1]
the first matched row per column, then take the which.min
to get the column index number (of pub respectively age) which you then paste
with "age" to assign (using eval(parse(text = variable name))
) the respective column.
E.g. here after apply
you get
[pub1 = 2, pub2 = 1, pub3 = 2, pub4 = 4, pub5 = 2]
which is the first occurrence of 2 per column. The earliest (which.min
) occurrence is for the second pub column, thus index is 2
. This pasted with "age" and eval parsed to mutate.
EDIT
It is probably more convenient to do it in a for loop for all age_pbi, or there is an easy solution in dplyr
that I am not aware of.
for (i in 1:5) {
index <- which.min(apply(select(data, starts_with("pub")), 2, function(x) which(x == i)[1]))
data[ ,paste0("age_pb", i)] <- data[ ,paste0("age", index)]
}
Note however, that which.min
takes the first minimum. E.g. pub1 and pub2 both have a 1 in the first row, so the above approach assigns age1 to age_pb1 whereas it could be age2 as well. I don't know what you want to do with this, so can't say what is a better option.
Upvotes: 1