epifan
epifan

Reputation: 55

Finding the closest values above and below a certain date by group

I have a dataframe of longitudinal measurements and am looking to find the closest measurements that were taken both above and below a certain date. My date column is coded as the number of days since the beginning of the study.

For each participant, I want to find the one measurement that is the closest value BELOW 100 days of the study and the one measurement that is the closest value ABOVE 100 days of the study. If the participant has a measurement at 100 days, then I want that value returned.

My dataframe looks like this:

df <- data_frame(id = c(1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3), 
measures = c(10, 11, 11.4, 11.7, 11.8, 4.1, 4.3, 4.7, 13.3, 13.2, 13.5, 13.9, 14, 14.1),
days = c(5, 45, 60, 94, 104, 21, 76, 115, 10, 26, 73, 100, 132, 154))

df
# id measures days
# 1     10       5
# 1     11      45
# 1     11.4    60
# 1     11.7    94
# 1     11.8   104
# 2      4.1    21
# 2      4.3    76
# 2      4.7   115
# 3     13.3    10
# 3     13.2    26
# 3     13.5    73
# 3     13.9   100
# 3     14     132
# 3     14.1   154

For ID 1, I want it to return days 94 and 104 and measurements 11.7 and 11.8. For ID 2, I want it to return days 76 and 115 and measurements 4.3 and 4.7. For ID 3, I want it to return day 100 twice and measurement 13.9 twice.

Here's what I have so far:

library(dplyr)
df %>% group_by(id) %>% 
summarise(below = max(df$days[df$days <= 100]),
          above = min(df$days[df$days >= 100]),
          below_msrmt = df$measures[which(df$days == below)],
          above_msrmt = df$measures[which(df$days == above)])

But it looks like the code isn't reading the group_by argument because it keeps giving me this:

# id below above below_msrmt above_msrmt
# 1    100   100        13.9        13.9
# 2    100   100        13.9        13.9
# 3    100   100        13.9        13.9

Can anyone help me with this?

Upvotes: 2

Views: 73

Answers (1)

akrun
akrun

Reputation: 887721

If we remove the df$, it works because df$days or df$measures selects the entire column and not just the values of that columns within each group. Thus, we get same value for all groups as it is the summarised value for the entire column

df %>% group_by(id) %>% 
    summarise(below = max(days[days <= 100]),
      above = min(days[days >= 100]),
      below_msrmt = measures[which(days == below)],
      above_msrmt = measures[which(days == above)])

Upvotes: 1

Related Questions