Reputation: 41
disclosure - this is my first SO question, my apologies if this is a repeat question but I have looked for a while now and have not found an answer to this particular scenario
R version: 3.4.2
I want an efficient way of grouping data by a certain identifier and then summarize based on a condition - dynamically for each row. Specifically, group by ID and then sum how many instances another variable occurred (urgent visits) IF the other instance was within 1 year of the current row.
Here is an example of what the data looks like to start:
Updated to include example of 2 urgent cases
library(lubridate)
> dat <- data.frame("ID" = c(6,6,6,7,7,10,11,11,11),
"Admit_Dt" = as.Date(c('2013-08-12', '2013-12-12', '2016-01-03','2011-04-01', '2011-09-20','2012-02-19','2014-06-24','2014-08-12','2014-09-01')),
"Urgent" = c(0,1,1,1,0,0,1,1,1))
> dat
| ID | Admit_Dt | Urgent|
| 6 | 2013-08-12 | 1|
| 6 | 2013-12-12 | 0|
| 6 | 2016-01-03 | 1|
| 7 | 2011-04-01 | 1|
| 7 | 2011-09-20 | 0|
| 10 | 2012-02-19 | 0|
| 11 | 2014-06-24 | 1|
| 11 | 2014-08-12 | 1|
| 11 | 2014-09-01 | 1|
I want to first group by ID and then sum how many urgent visits occurred within one year of each Admit_Dt for a given group.
This over complicated code below produces what I want but the dataset I am working with is very large and I this is pretty inefficient. I'm curious if there is a method using 'dplyr' to achieve what I am trying to do:
> dat$Urgent_1yrSum <- unlist(sapply(1:length(unique(dat$ID)), function(i) {
grouped <- subset(dat, ID == unique(dat$ID)[i])
output <- do.call(rbind, lapply(1:nrow(grouped), function(y){
urgent_sum_1year <- sum(grouped[grouped$Admit_Dt < grouped$Admit_Dt[y] & grouped$Admit_Dt > (grouped$Admit_Dt[y] - dyears(1)), "Urgent"])
}))
return(output)
}
))
> dat
| ID | Admit_Dt | Urgent| Urgent_1yrSum|
| 6 | 2013-08-12 | 1| 0|
| 6 | 2013-12-12 | 0| 1|
| 6 | 2016-01-03 | 1| 0|
| 7 | 2011-04-01 | 1| 0|
| 7 | 2011-09-20 | 0| 1|
| 10 | 2012-02-19 | 0| 0|
| 11 | 2014-06-24 | 1| 0|
| 11 | 2014-08-12 | 1| 1|
| 11 | 2014-09-01 | 1| 2|
Thanks for any help!!
Upvotes: 4
Views: 1002
Reputation: 4370
I couldn't get your code to work so I tried to see if I could get this to work. I first inner joined to get all combos of dates by ID. When you subtract them you can then use filter to see who received a visit within a year of each date and then summarise
.
dat <- data.frame("ID" = c(6,6,6,7,7,10,11,11),
"Admit_Dt" = c('2013-08-12', '2013-12-12', '2016-01-03','2011-04-01', '2011-09-20','2012-02-19','2014-06-24','2014-08-12'),
"Urgent" = c(0,1,1,1,0,0,1,1),stringsAsFactors = FALSE)
library(dplyr)
dat2 <- inner_join(dat,select(dat,ID,Admit_Dt,Urgent),by = "ID") %>%
#Inner Join by ID to get every combo of dates
#Subtract dates from eachother
mutate(datediff = as.Date(Admit_Dt.x) - as.Date(Admit_Dt.y),
ID = ID) %>%
#get dates that occured within one year of visit
#for each date
filter(datediff > 0 & datediff <= 365) %>%
#group by person
group_by(ID,Admit_Dt.x) %>%
#count urgent visits
mutate(urgent_visits = max(Urgent.x,Urgent.y,na.rm=TRUE)) %>%
summarise(vs = sum(urgent_visits))
#Join back on to df
dat3 <- left_join(dat,dat2,by = c("ID" = "ID", "Admit_Dt"="Admit_Dt.x"))
Upvotes: 5
Reputation: 764
Here's an answer using dplyr
, list columns, and purrr
. I'm assuming there are no duplicate ID
s and Admit_Dt
s otherwise I'm pretty sure this doesn't work right.
dat <- data.frame("ID" = c(6,6,6,7,7,10,11,11),
"Admit_Dt" = c('2013-08-12', '2013-12-12', '2016-01-03','2011-04-01', '2011-09-20','2012-02-19','2014-06-24','2014-08-12'),
"Urgent" = c(0,1,1,1,0,0,1,1), stringsAsFactors = F)
library(dplyr)
library(purrr)
library(lubridate)
isUrgentAndWithinYear <- function(urgent, date, date1){
sum( urgent == 1 & abs(as.numeric(difftime(date, date1, units = "weeks"))) < 52)
}
dat %>%
mutate(Admit_Dt = ymd(Admit_Dt)) %>%
group_by(ID) %>%
mutate(urgents = list(Urgent),
admits = list(Admit_Dt)
)%>%
group_by(ID, Admit_Dt) %>%
mutate(Urgent_1yrSum = map2_dbl(urgents, admits, ~ isUrgentAndWithinYear(.x, .y, Admit_Dt) )) %>%
mutate(Urgent_1yrSum = Urgent_1yrSum - Urgent) %>%
select(-urgents, -admits)
Upvotes: 1