Reputation: 37
I have a long-formatted set of longitudinal data with two variables that change over time (cognitive_score
and motor_score
), subject id (labeled subjid
) and age of each subject in days at the moment of measurement(labeled agedays
). Measurements were taken twice.
I want to transform it to wide-formatted longitudinal dataset.
The problem is that agedays
measurements are unique for each subject, and the only way to see which measurement entry was the first, and which was the second, is to check where the agedays is higher (agedays
higher than in the other entry means second measurement, lower agedays
means first measurement).
We thus have this dataset:
subjid agedays cognitive_score motor_score
<int> <int> <dbl> <dbl>
1 4900001 457 0.338 0.176
2 4900001 1035 0.191 0.216
3 4900002 639 0.25 0.176
4 4900002 1248 0.176 0.353
5 4900003 335 0.103 0.196
6 4900003 913 0.176 0.196
And what I tried was using reshape:
reshape(dataset_col, direction = "wide", idvar = "subjid", timevar = "agedays", v.names = c("cognitive_score", "motor_score"))
Where dataset_col
is the name of the dataset.
What it does, however, is adding these two columns:
The numbers in the name of the columns seem to be the values of agedays
variable.
Any advice on how I can do this?
Upvotes: 1
Views: 401
Reputation: 2283
With libraries dplyr
and tidyr
:
You can use group_by
and mutate
to determine which of the ids is the first measurement and which is the second measurement. In this case I used an ifelse
statement to determine which measurement is the first, this only works if you have exactly 2 measurements for each subjid.
Then you can use pivot_wider
to pivot your data to wide format based on your newly created names column.
library(dplyr)
library(tidyr)
df %>%
group_by(subjid) %>%
mutate(names = ifelse(agedays == min(agedays),"first","second")) %>%
pivot_wider(names_from = names, values_from = c(agedays,cognitive_score,motor_score))
# A tibble: 3 × 7
# Groups: subjid [3]
# subjid agedays_first agedays_second cognitive_score_first cognitive_score_second motor_score_first motor_score_second
# <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
# 1 4900001 457 1035 0.338 0.191 0.176 0.216
# 2 4900002 639 1248 0.25 0.176 0.176 0.353
# 3 4900003 335 913 0.103 0.176 0.196 0.196
Upvotes: 0