Reputation: 69
I am currently working on a dataset with multiple biopsies per patient ID. I need to find the biopsy result closest to a specific date (individual per patient). A dummy dataset can be seen below
df <- data.frame(m1 = c("1","1","1","2","2","2"),
patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"),
baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
biopsy=c("1","2","3","1","2","3"))
I have then calculated the time difference between patodate and baselinedate
df$patodate <- as.Date(df$patodate)
df$baselinedate <- as.Date(df$baselinedate)
df <- df%>%
group_by(m1) %>%
mutate(diff = baselinedate-recdate)
My question is now - I want to add a new column called 'status' which shows (by group m1) the 'biopsy' result with the time difference closest to 0. The end result would be
df <- data.frame(m1 = c("1","1","1","2","2","2"),
patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"),
baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
biopsy=c("1","2","3","1","2","3"),
status=c("3","3","3","2","2","2"))
I hope someone understands the issue and is able to help. Many thanks
Kind regards,
Tobias Berg
Upvotes: 4
Views: 608
Reputation: 78917
Here is an alternative way:
library(dplyr)
library(lubridate)
df %>%
group_by(m1) %>%
mutate(across(contains("date"), ymd),
helper = abs(difftime(baselinedate,patodate))) %>%
mutate(status = biopsy[helper==min(helper)]) %>%
select(-helper)
m1 patodate baselinedate biopsy status
<chr> <date> <date> <chr> <chr>
1 1 2013-06-03 2018-11-09 1 3
2 1 2014-01-06 2018-11-09 2 3
3 1 2018-11-23 2018-11-09 3 3
4 2 2004-03-03 2018-07-24 1 2
5 2 2018-06-25 2018-07-24 2 2
6 2 2018-12-19 2018-07-24 3 2
Upvotes: 2
Reputation: 886938
We may do
library(dplyr)
df %>%
group_by(m1) %>%
mutate(status = abs(patodate - baselinedate),
status = which(status == min(status))[1]) %>%
ungroup
-output
# A tibble: 6 × 5
m1 patodate baselinedate biopsy status
<chr> <date> <date> <chr> <int>
1 1 2013-06-03 2018-11-09 1 3
2 1 2014-01-06 2018-11-09 2 3
3 1 2018-11-23 2018-11-09 3 3
4 2 2004-03-03 2018-07-24 1 2
5 2 2018-06-25 2018-07-24 2 2
6 2 2018-12-19 2018-07-24 3 2
Upvotes: 2
Reputation: 388797
You can get index of minimum absolute value of difference between the dates for each group.
library(dplyr)
df %>%
group_by(m1) %>%
mutate(status = which.min(abs(patodate - baselinedate))) %>%
ungroup
# m1 patodate baselinedate biopsy status
# <chr> <date> <date> <chr> <int>
#1 1 2013-06-03 2018-11-09 1 3
#2 1 2014-01-06 2018-11-09 2 3
#3 1 2018-11-23 2018-11-09 3 3
#4 2 2004-03-03 2018-07-24 1 2
#5 2 2018-06-25 2018-07-24 2 2
#6 2 2018-12-19 2018-07-24 3 2
Upvotes: 1