Reputation: 55
I have data for 3 women with measures repeated across 3 observations (menstrual cycles) each. There is lots of missing data.
I want to look at the relationship between progesterone this cycle and hb (haemoglobin) last cycle.
My data currently looks like this:
data = data.frame(id = c(1,1,1,2,2,2,3,3,3),
progesterone = c(150, 140, 130, 145, 130, NA, 150, 150, NA),
hb_this_cycle = c(9, 8, 9, NA, 9, 10, 9, 8, 7))
# id progesterone hb
# 1 1 150 9
# 2 1 140 8
# 3 1 130 9
# 4 2 NA NA
# 5 2 130 9
# 6 2 NA 10
# 7 3 150 9
# 8 3 150 8
# 9 3 NA 7
I want to add a variable which represents hb (haemoglobin) from the cycle before. However, I want to do this within IDs. I also need it to be able to handle lots of NAs.
I would like the data therefore to look like this afterwards ideally:
data_ideal = data.frame(id = c(1,1,1,2,2,2,3,3,3),
progesterone = c(150, 140, 130, 145, 130, NA, 150, 150, NA),
hb_this_cycle = c(9, 8, 9, NA, 9, 10, 9, 8, 7),
hb_last_cycle = c(NA, 9, 8, NA, NA, 9, NA, 9, 8))
# id progesterone hb_this_cycle hb_last_cycle
# 1 1 150 9 NA
# 2 1 140 8 9
# 3 1 130 9 8
# 4 2 145 NA NA
# 5 2 130 9 NA
# 6 2 NA 10 9
# 7 3 150 9 NA
# 8 3 150 8 9
# 9 3 NA 7 8
Any help would be super appreciated! Thanks.
Upvotes: 0
Views: 30
Reputation: 1577
You can do it like below. Note that you might have an error in your expected output as the third group receives the last value of the second group which might not be what you want?
data_ideal %>% group_by(id) %>% mutate(new = lag(hb_this_cycle))
# A tibble: 9 x 5
# Groups: id [3]
id progesterone hb_this_cycle hb_last_cycle new
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 150 9 NA NA
2 1 140 8 9 9
3 1 130 9 8 8
4 2 145 NA NA NA
5 2 130 9 NA NA
6 3 NA 10 9 NA
7 3 150 9 NA 10
8 3 150 8 9 9
9 3 NA 7 8 8
Upvotes: 1