slp255
slp255

Reputation: 11

How to find observations within a certain time range of each other in R

I have a dataset with ID, date, days of life, and medication variables. Each ID has multiple observations indicating different administrations of a certain drug. I want to find UNIQUE meds that were administered within 365 days of each other. A sample of the data frame is as follows:

ID    date          dayoflife    meds
1     2003-11-24    16361        lasiks
1     2003-11-24    16361        vigab
1     2004-01-09    16407        lacos
1     2013-11-25    20015        pheno
1     2013-11-26    20016        vigab
1     2013-11-26    20016        lasiks
2     2008-06-05    24133        pheno
2     2008-04-07    24074        vigab
3     2014-11-25    8458         pheno
3     2014-12-22    8485         pheno

I expect the outcome to be:

ID    N
1     3
2     2
3     1

indicating that individual 1 had a max of 3 different types of medications administered within 365 days of each other. I am not sure if it is best to use days of life or the date to get to this expected outcome.Any help is appreciated

Upvotes: 1

Views: 558

Answers (2)

arg0naut91
arg0naut91

Reputation: 14774

You can try:

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(date = as.Date(date),
         lag_date = abs(date - lag(date)) <= 365,
         lead_date = abs(date - lead(date)) <= 365) %>%
  mutate_at(vars(lag_date, lead_date), ~ ifelse(., ., NA)) %>%
  filter(coalesce(lag_date, lead_date)) %>%
  summarise(N = n_distinct(meds))

Output:

# A tibble: 3 x 2
     ID     N
  <int> <int>
1     1     2
2     2     2
3     3     1

Upvotes: 1

akrun
akrun

Reputation: 887891

An option would be to convert the 'date' to Date class, grouped by 'ID', get the absolute difference of 'date' and the lag of the column, check whether it is greater than 365, create a grouping index with cumsum, get the number of distinct elements of 'meds' in summarise

library(dplyr)
df1 %>% 
   mutate(date = as.Date(date)) %>%
   group_by(ID) %>% 
   mutate(diffd = abs(as.numeric(difftime(date, lag(date, default = first(date)),
               units = 'days')))) %>%
   group_by(grp = cumsum(diffd > 365), add = TRUE) %>%
   summarise(N = n_distinct(meds)) %>%
   group_by(ID) %>%
   summarise(N = max(N))
# A tibble: 3 x 2
#     ID     N
#  <int> <int>
#1     1     2
#2     2     2
#3     3     1

Upvotes: 1

Related Questions