Rose Stevens
Rose Stevens

Reputation: 55

When using repeated measures within IDs in R, how do I create a variable for a value from the observation before?

I have data for 3 women with measures repeated across 3 observations (menstrual cycles) each. There is lots of missing data.

I want to look at the relationship between progesterone this cycle and hb (haemoglobin) last cycle.

My data currently looks like this:

data = data.frame(id = c(1,1,1,2,2,2,3,3,3), 
                  progesterone = c(150, 140, 130, 145, 130, NA, 150, 150, NA),
                  hb_this_cycle = c(9, 8, 9, NA, 9, 10, 9, 8, 7))

#    id progesterone hb
# 1  1          150  9
# 2  1          140  8
# 3  1          130  9
# 4  2           NA NA
# 5  2          130  9
# 6  2           NA 10
# 7  3          150  9
# 8  3          150  8
# 9  3           NA  7

I want to add a variable which represents hb (haemoglobin) from the cycle before. However, I want to do this within IDs. I also need it to be able to handle lots of NAs.

I would like the data therefore to look like this afterwards ideally:

data_ideal = data.frame(id = c(1,1,1,2,2,2,3,3,3), 
                  progesterone = c(150, 140, 130, 145, 130, NA, 150, 150, NA),
                  hb_this_cycle = c(9, 8, 9, NA, 9, 10, 9, 8, 7),
                  hb_last_cycle = c(NA, 9, 8, NA, NA, 9, NA, 9, 8))

#   id progesterone hb_this_cycle hb_last_cycle
# 1  1          150             9            NA
# 2  1          140             8             9
# 3  1          130             9             8
# 4  2          145            NA            NA
# 5  2          130             9            NA
# 6  2           NA            10             9
# 7  3          150             9            NA
# 8  3          150             8             9
# 9  3           NA             7             8

Any help would be super appreciated! Thanks.

Upvotes: 0

Views: 30

Answers (1)

Taufi
Taufi

Reputation: 1577

You can do it like below. Note that you might have an error in your expected output as the third group receives the last value of the second group which might not be what you want?

data_ideal %>% group_by(id) %>% mutate(new = lag(hb_this_cycle))
# A tibble: 9 x 5
# Groups:   id [3]
     id progesterone hb_this_cycle hb_last_cycle   new
  <dbl>        <dbl>         <dbl>         <dbl> <dbl>
1     1          150             9            NA    NA
2     1          140             8             9     9
3     1          130             9             8     8
4     2          145            NA            NA    NA
5     2          130             9            NA    NA
6     3           NA            10             9    NA
7     3          150             9            NA    10
8     3          150             8             9     9
9     3           NA             7             8     8

Upvotes: 1

Related Questions