Tobias Berg
Tobias Berg

Reputation: 69

Dplyr - choosing value in column based on lowest value in other column in R

I am currently working on a dataset with multiple biopsies per patient ID. I need to find the biopsy result closest to a specific date (individual per patient). A dummy dataset can be seen below


df <- data.frame(m1 = c("1","1","1","2","2","2"), 
                 patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"), 
                 baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
                 biopsy=c("1","2","3","1","2","3"))

I have then calculated the time difference between patodate and baselinedate

df$patodate <- as.Date(df$patodate)
df$baselinedate <- as.Date(df$baselinedate)

df <- df%>%
  group_by(m1) %>%
  mutate(diff = baselinedate-recdate)

My question is now - I want to add a new column called 'status' which shows (by group m1) the 'biopsy' result with the time difference closest to 0. The end result would be

df <- data.frame(m1 = c("1","1","1","2","2","2"), 
                 patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"), 
                 baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
                 biopsy=c("1","2","3","1","2","3"),
                 status=c("3","3","3","2","2","2"))

I hope someone understands the issue and is able to help. Many thanks

Kind regards,

Tobias Berg

Upvotes: 4

Views: 608

Answers (3)

TarJae
TarJae

Reputation: 78917

Here is an alternative way:

library(dplyr)
library(lubridate)
df %>% 
  group_by(m1) %>% 
  mutate(across(contains("date"), ymd),
         helper = abs(difftime(baselinedate,patodate))) %>% 
  mutate(status = biopsy[helper==min(helper)]) %>% 
  select(-helper)
  m1    patodate   baselinedate biopsy status
  <chr> <date>     <date>       <chr>  <chr> 
1 1     2013-06-03 2018-11-09   1      3     
2 1     2014-01-06 2018-11-09   2      3     
3 1     2018-11-23 2018-11-09   3      3     
4 2     2004-03-03 2018-07-24   1      2     
5 2     2018-06-25 2018-07-24   2      2     
6 2     2018-12-19 2018-07-24   3      2  

Upvotes: 2

akrun
akrun

Reputation: 886938

We may do

library(dplyr)
df %>%
     group_by(m1) %>%
     mutate(status =  abs(patodate - baselinedate),
          status = which(status == min(status))[1]) %>% 
     ungroup

-output

# A tibble: 6 × 5
  m1    patodate   baselinedate biopsy status
  <chr> <date>     <date>       <chr>   <int>
1 1     2013-06-03 2018-11-09   1           3
2 1     2014-01-06 2018-11-09   2           3
3 1     2018-11-23 2018-11-09   3           3
4 2     2004-03-03 2018-07-24   1           2
5 2     2018-06-25 2018-07-24   2           2
6 2     2018-12-19 2018-07-24   3           2

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388797

You can get index of minimum absolute value of difference between the dates for each group.

library(dplyr)

df %>%
  group_by(m1) %>%
  mutate(status = which.min(abs(patodate - baselinedate))) %>%
  ungroup

#  m1    patodate   baselinedate biopsy status
#  <chr> <date>     <date>       <chr>   <int>
#1 1     2013-06-03 2018-11-09   1           3
#2 1     2014-01-06 2018-11-09   2           3
#3 1     2018-11-23 2018-11-09   3           3
#4 2     2004-03-03 2018-07-24   1           2
#5 2     2018-06-25 2018-07-24   2           2
#6 2     2018-12-19 2018-07-24   3           2

Upvotes: 1

Related Questions