Reputation: 137
I am trying to analyze longitudinal data. Each subject has come in for our study at least once and up to 3 times. I need to do comparisons of scores across visits to see if their treatments helped diminish the symptoms.
For now, I want to set up columns that indicate if the subject has a follow-up visit.
One column indicating if the subject came for a 2nd visit and another column that indicates if the subject came back for their 3rd visit
visit_id subject_id visit_number Measure1 Measure2 ...
1 Subject1 1
2 Subject2 1
3 Subject1 2
4 Subject3 1
5 Subject1 3
Using sapply to loop through all the visits by subject ID and populate the columns if that subject has a 2nd visit and if they have a 3rd visit.
I also tried a for loop but in each case I'm not sure how to tell it to loop through all instances of that subject and then select items to compare (i.e the existence of a secific visit number)
sapply(dat$subject_id, function(x) {
if(dat$visit_number == 2) {followup2 <- "yes"
}else {followup2 <- "no"}
if(dat$visit_number == 3) {followup3 <- "yes"
}else {followup3 <- "no"}
})
visit_id subject_id visit_number followup2 followup3
1 Subject1 1 yes yes
3 Subject1 2 yes yes
5 Subject1 3 yes yes
2 Subject2 1 yes no
6 Subject2 2 yes no
4 Subject3 1 no no
I intend to use a similar logic to go through each subject and compare their symptoms across visits. Comparing visit 1 and 2 and then comparing visit 2 and 3.
dat <- read.table(header = TRUE, stringsAsFactors = FALSE,
text = "visit_id subject_id visit_number
1 Subject1 1
3 Subject1 2
5 Subject1 3
2 Subject2 1
6 Subject2 2
4 Subject3 1")
Upvotes: 0
Views: 307
Reputation: 20811
Since you are repeating the same task over and over, you can make a function to do the work and then loop over the moving parts.
dat <- read.table(header = TRUE, stringsAsFactors = FALSE,
text = "visit_id subject_id visit_number
1 Subject1 1
3 Subject1 2
5 Subject1 3
2 Subject2 1
6 Subject2 2
4 Subject3 1")
This function will split visit
by each unique id
and see if the maximum visit
is greater than num
f <- function(id, visit, num) {
ave(visit, id, FUN = function(x) if (max(x) >= num) 'yes' else 'no')
}
Make some test cases to make sure it is working
with(dat, f(subject_id, visit_number, 1))
# [1] "yes" "yes" "yes" "yes" "yes" "yes"
with(dat, f(subject_id, visit_number, 2))
# [1] "yes" "yes" "yes" "yes" "yes" "no"
with(dat, f(subject_id, visit_number, 3))
# [1] "yes" "yes" "yes" "no" "no" "no"
Then decide what you need to loop over. You can also assign new columns in your data frame for each loop iteration in one go:
idx <- 2:3
dat[, paste0('followup', idx)] <- lapply(idx, function(x)
f(dat$subject_id, dat$visit_number, x))
# visit_id subject_id visit_number followup2 followup3
# 1 1 Subject1 1 yes yes
# 2 3 Subject1 2 yes yes
# 3 5 Subject1 3 yes yes
# 4 2 Subject2 1 yes no
# 5 6 Subject2 2 yes no
# 6 4 Subject3 1 no no
Upvotes: 1
Reputation: 312
Rather than trying to do this all in one go, I'd separate it to first identifying if a subject had a second (or third) visit or not, and then adding a column using that data.
To do the first:
subj_2_vist <- dat$subject_id[dat$visit_number==2]
Now subj_2_visit
will be a vector of all visitors who've had a second visit. Then you can use ifelse()
to create the new column:
dat$followup2 <- ifelse(dat$subject_id %in% subj_2_visit, "Yes", "No")
The same can be used for three visits by changing the check in the first part.
Upvotes: 1