Reputation: 415
I have a long format dataframe of responses to a 9 repeated question about puberty status vb_
asked at approximately yearly intervals.
Each year participants were asked to rate their development on a scale from 1 to 5, with 1 being least developed and 5 being most developed.
I would like to use R to create 5 new age columns age_vb_1:age_vb_5
that correspond to the the first age at which each score from 1-5 in vb_
is reached (or to take "NA" if that score is never recorded)
Here is some fake example data
vb <- structure(list(id = c(3L, 3L, 3L, 3L, 3L, 3L, 14L, 14L, 14L,
14L, 19L, 19L, 19L, 19L, 19L, 19L, 23L, 23L, 23L, 23L, 36L, 36L,
36L, 36L, 36L, 36L, 36L, 39L, 39L, 39L, 39L, 39L, 39L, 39L, 40L,
40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 65L, 65L, 65L, 65L, 65L, 67L, 67L, 67L, 67L,
67L, 67L, 67L, 67L), vb = c(1, 1, 3, 4, 4, 5, 3, 3, 4, 5, 1,
1, 2, 2, 3, 5, 1, 1, 2, 4, 1, 2, 4, 4, 4, 5, 4, 1, 1, 1, 2, 4,
5, 5, 1, 1, 1, 1, 3, 4, 4, 4, 5, 2, 2, 2, 3, 4, 5, 5, 5, 5, 1,
5, 5, 5, 5, 2, 3, 3, 3, 4, 4, 4, 5), age = c(8.333333, 9.75,
11.666667, 13.166667, 14.833333, 16, 13.25, 14.833333, 15.25,
17, 9.583333, 10.666667, 11.666667, 13.083333, 14.666667, 17,
8.333333, 10.833333, 13.083333, 14.916667, 9.583333, 10.666667,
13.083333, 14.666667, 15.416667, 16.333334, 17, 8.166667, 9.666667,
10.666667, 11.666667, 13.083333, 14.666667, 16.25, 8.416667,
9.833333, 10.75, 11.666667, 13.083333, 15, 15.416667, 16.166666,
17, 8.416667, 9.666667, 10.666667, 11.666667, 13.083333, 14.916667,
15.5, 16, 17.083334, 9.583333, 14.583333, 15.25, 16, 17.25, 9.583333,
10.666667, 11.75, 13.083333, 14.583333, 15.166667, 16.25, 17)), class = "data.frame", row.names = c(NA,
-65L), .Names = c("id", "vb", "age"))
Example expected output
id vb_ age age_vb_1 age_vb_2 age_vb_3 age_vb_4 age_vb_5
1 1 8 8 NA 11 NA NA
1 1 9 8 NA 11 NA NA
1 3 11 8 NA 11 NA NA
1 4 13 8 NA 11 NA NA
2 1 9 9 10 13 14 15
2 2 10 9 10 13 14 15
2 3 13 9 10 13 14 15
2 4 14 9 10 13 14 15
2 5 15 9 10 13 14 15
2 5 17 9 10 13 14 15
Upvotes: 0
Views: 46
Reputation: 26343
We can use reshape
and merge
to get desired output.
merge(vb,
reshape(vb, idvar = "id", timevar = "vb", direction = "wide", sep = "_vb_"),
by = "id")
# id vb age age_vb_1 age_vb_3 age_vb_4 age_vb_5 age_vb_2
#1 3 1 8.333333 8.333333 11.66667 13.16667 16.00000 NA
#2 3 1 9.750000 8.333333 11.66667 13.16667 16.00000 NA
#3 3 3 11.666667 8.333333 11.66667 13.16667 16.00000 NA
#4 3 4 13.166667 8.333333 11.66667 13.16667 16.00000 NA
#5 3 4 14.833333 8.333333 11.66667 13.16667 16.00000 NA
#6 3 5 16.000000 8.333333 11.66667 13.16667 16.00000 NA
#7 14 3 13.250000 NA 13.25000 15.25000 17.00000 NA
#8 14 3 14.833333 NA 13.25000 15.25000 17.00000 NA
#9 14 4 15.250000 NA 13.25000 15.25000 17.00000 NA
#10 14 5 17.000000 NA 13.25000 15.25000 17.00000 NA
#11 19 1 9.583333 9.583333 14.66667 NA 17.00000 11.666667
#12 19 1 10.666667 9.583333 14.66667 NA 17.00000 11.666667
#13 19 2 11.666667 9.583333 14.66667 NA 17.00000 11.666667
#14 19 2 13.083333 9.583333 14.66667 NA 17.00000 11.666667
#15 19 3 14.666667 9.583333 14.66667 NA 17.00000 11.666667
#16 19 5 17.000000 9.583333 14.66667 NA 17.00000 11.666667
# ...
reshape
creates a new dataframe with columns age_vb_1
to age_vb_5
which we merge with vb
.
Upvotes: 1