identify age at 1st occurrence of each value in long data frame

Question

I have a long format dataframe of responses to a 9 repeated question about puberty status vb_ asked at approximately yearly intervals.

Each year participants were asked to rate their development on a scale from 1 to 5, with 1 being least developed and 5 being most developed.

I would like to use R to create 5 new age columns age_vb_1:age_vb_5 that correspond to the the first age at which each score from 1-5 in vb_ is reached (or to take "NA" if that score is never recorded)

Here is some fake example data

vb <- structure(list(id = c(3L, 3L, 3L, 3L, 3L, 3L, 14L, 14L, 14L, 
                        14L, 19L, 19L, 19L, 19L, 19L, 19L, 23L, 23L, 23L, 23L, 36L, 36L, 
                        36L, 36L, 36L, 36L, 36L, 39L, 39L, 39L, 39L, 39L, 39L, 39L, 40L, 
                        40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 48L, 48L, 48L, 48L, 48L, 
                        48L, 48L, 48L, 48L, 65L, 65L, 65L, 65L, 65L, 67L, 67L, 67L, 67L, 
                        67L, 67L, 67L, 67L), vb = c(1, 1, 3, 4, 4, 5, 3, 3, 4, 5, 1, 
                                                    1, 2, 2, 3, 5, 1, 1, 2, 4, 1, 2, 4, 4, 4, 5, 4, 1, 1, 1, 2, 4, 
                                                    5, 5, 1, 1, 1, 1, 3, 4, 4, 4, 5, 2, 2, 2, 3, 4, 5, 5, 5, 5, 1, 
                                                    5, 5, 5, 5, 2, 3, 3, 3, 4, 4, 4, 5), age = c(8.333333, 9.75, 
                                                                                                 11.666667, 13.166667, 14.833333, 16, 13.25, 14.833333, 15.25, 
                                                                                                 17, 9.583333, 10.666667, 11.666667, 13.083333, 14.666667, 17, 
                                                                                                 8.333333, 10.833333, 13.083333, 14.916667, 9.583333, 10.666667, 
                                                                                                 13.083333, 14.666667, 15.416667, 16.333334, 17, 8.166667, 9.666667, 
                                                                                                 10.666667, 11.666667, 13.083333, 14.666667, 16.25, 8.416667, 
                                                                                                 9.833333, 10.75, 11.666667, 13.083333, 15, 15.416667, 16.166666, 
                                                                                                 17, 8.416667, 9.666667, 10.666667, 11.666667, 13.083333, 14.916667, 
                                                                                                 15.5, 16, 17.083334, 9.583333, 14.583333, 15.25, 16, 17.25, 9.583333, 
                                                                                                 10.666667, 11.75, 13.083333, 14.583333, 15.166667, 16.25, 17)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                     -65L), .Names = c("id", "vb", "age"))

Example expected output

id vb_ age age_vb_1  age_vb_2  age_vb_3  age_vb_4   age_vb_5
1  1    8     8         NA       11         NA         NA       
1  1    9     8         NA       11         NA         NA
1  3   11     8         NA       11         NA         NA
1  4   13     8         NA       11         NA         NA
2  1   9      9         10       13         14         15
2  2   10     9         10       13         14         15
2  3   13     9         10       13         14         15
2  4   14     9         10       13         14         15
2  5   15     9         10       13         14         15
2  5   17     9         10       13         14         15

markus · Accepted Answer

We can use reshape and merge to get desired output.

merge(vb, 
      reshape(vb, idvar = "id", timevar = "vb", direction = "wide", sep = "_vb_"),
      by = "id")
#   id vb       age age_vb_1 age_vb_3 age_vb_4 age_vb_5  age_vb_2
#1   3  1  8.333333 8.333333 11.66667 13.16667 16.00000        NA
#2   3  1  9.750000 8.333333 11.66667 13.16667 16.00000        NA
#3   3  3 11.666667 8.333333 11.66667 13.16667 16.00000        NA
#4   3  4 13.166667 8.333333 11.66667 13.16667 16.00000        NA
#5   3  4 14.833333 8.333333 11.66667 13.16667 16.00000        NA
#6   3  5 16.000000 8.333333 11.66667 13.16667 16.00000        NA
#7  14  3 13.250000       NA 13.25000 15.25000 17.00000        NA
#8  14  3 14.833333       NA 13.25000 15.25000 17.00000        NA
#9  14  4 15.250000       NA 13.25000 15.25000 17.00000        NA
#10 14  5 17.000000       NA 13.25000 15.25000 17.00000        NA
#11 19  1  9.583333 9.583333 14.66667       NA 17.00000 11.666667
#12 19  1 10.666667 9.583333 14.66667       NA 17.00000 11.666667
#13 19  2 11.666667 9.583333 14.66667       NA 17.00000 11.666667
#14 19  2 13.083333 9.583333 14.66667       NA 17.00000 11.666667
#15 19  3 14.666667 9.583333 14.66667       NA 17.00000 11.666667
#16 19  5 17.000000 9.583333 14.66667       NA 17.00000 11.666667
# ...

reshape creates a new dataframe with columns age_vb_1 to age_vb_5 which we merge with vb.

identify age at 1st occurrence of each value in long data frame

Answers (1)

Related Questions