how to compare dates of two separate column only if another selected column fits a condition

Question

just wondering if anyone can help me with this frustrating problem- im not sure how to go about it as it involves a combination of problems. I'm pretty new to coding so there may be some unclear points in this question- please let me know if something doesn't make sense!

an example of my data

#reproducible data (hopefully)

dat <-structure(list(id = 1:5, 
                     opstart = structure(c(1514970000, 1514904000, 1514916900, 1514883900, 1514979600), 
                                         class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
                     crdate_1 = structure(c(1514818380, 1514965080,1514752680, 1514760180, 1514919480), 
                                          class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
                      crdate_2 = structure(c(1515058680, 1515058740,1514817780, 1514965080, 1515064560), 
                                          class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
                     crdate_3 = structure(c(1515564000, 1515137700,  1514876100, NA, 1516101000), 
                                          class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
                      aki_1 = c("aki", NA, NA, NA, "aki"),
                      aki_2 = c("aki", NA, "aki", NA, "aki"), 
                     aki_3 = c("aki", "aki", "aki", NA, "aki")), 
                    row.names = c(NA, 5L), class = "data.frame")

id             opstart            crdate_1            crdate_2            crdate_3 aki_1 aki_2 aki_3
1  1 2018-01-03 09:00:00 2018-01-01 14:53:00 2018-01-04 09:38:00 2018-01-10 06:00:00   aki   aki   aki
2  2 2018-01-02 14:40:00 2018-01-03 07:38:00 2018-01-04 09:39:00 2018-01-05 07:35:00       aki
3  3 2018-01-02 18:15:00 2017-12-31 20:38:00 2018-01-01 14:43:00 2018-01-02 06:55:00     aki   aki
4  4 2018-01-02 09:05:00 2017-12-31 22:43:00 2018-01-03 07:38:00                      
5  5 2018-01-03 11:40:00 2018-01-02 18:58:00 2018-01-04 11:16:00 2018-01-16 11:10:00   aki   aki   aki

what I am trying to do is to make a new column called aki_status using mutate which will detail whether "aki" was before the opdate or after the opdate. crdate_1 corresponds to aki_1, and crdate_2 corresponds to aki_2 etc.

the complication is that the aki_status should be based on the column which "aki" first appears. so for example, for the first row, aki comes up in aki_1, so crdate_1 would be used in the comparison against opdate, but in row three, aki comes up in aki_2 first, so crdate_2 should be used in the comparison against opdate.

the ideal output would be

aki_status(preop, postop, preop, NA, postop)

Jon Spring · Accepted Answer

Here's an approach that relies on first reshaping the original data into a longer format that is easier to work with. The table "spec" here defines which column ("date, "aki", or "obs") we want the columns in the source data to feed into.

library(tidyverse)
spec <- tribble(
  ~.name,    ~.value, ~obs,
  "crdate_1", "date", 1,
  "crdate_2", "date", 2,
  "crdate_3", "date", 3,
  "aki_1",    "aki",  1,
  "aki_2",    "aki",  2,
  "aki_3",    "aki",  3
)

dat_long <- pivot_longer_spec(dat, spec)

This table is easier to work with since it allows us to directly compare each crdate with its respective opstart date, while keeping track of which round it is.

#> dat_long
## A tibble: 15 x 5
#      id opstart               obs date                aki  
#                                 
# 1     1 2018-01-03 09:00:00     1 2018-01-01 14:53:00 aki  
# 2     1 2018-01-03 09:00:00     2 2018-01-04 09:38:00 aki  
# 3     1 2018-01-03 09:00:00     3 2018-01-10 06:00:00 aki  
# 4     2 2018-01-02 14:40:00     1 2018-01-03 07:38:00 NA   
# 5     2 2018-01-02 14:40:00     2 2018-01-04 09:39:00 NA   
# 6     2 2018-01-02 14:40:00     3 2018-01-05 07:35:00 aki  
# 7     3 2018-01-02 18:15:00     1 2017-12-31 20:38:00 NA   
# 8     3 2018-01-02 18:15:00     2 2018-01-01 14:43:00 aki  
# 9     3 2018-01-02 18:15:00     3 2018-01-02 06:55:00 aki  
#10     4 2018-01-02 09:05:00     1 2017-12-31 22:43:00 NA   
#11     4 2018-01-02 09:05:00     2 2018-01-03 07:38:00 NA   
#12     4 2018-01-02 09:05:00     3 NA                  NA   
#13     5 2018-01-03 11:40:00     1 2018-01-02 18:58:00 aki  
#14     5 2018-01-03 11:40:00     2 2018-01-04 11:16:00 aki  
#15     5 2018-01-03 11:40:00     3 2018-01-16 11:10:00 aki

Now we can look at the first "aki" within each "id", to get a summary stat "aki_status" for each "id".

results <- dat_long %>%
  group_by(id) %>%
  filter(aki == "aki") %>%
  slice_min(date) %>%  # or slice_min(obs) -- same result always?
  mutate(aki_status = if_else(date < opstart, "preop", "postop"))

dat %>% left_join(results, by = c("id", "opstart"))

  id             opstart            crdate_1            crdate_2            crdate_3 aki_1 aki_2 aki_3 obs                date  aki aki_status
1  1 2018-01-03 09:00:00 2018-01-01 14:53:00 2018-01-04 09:38:00 2018-01-10 06:00:00   aki   aki   aki   1 2018-01-01 14:53:00  aki      preop
2  2 2018-01-02 14:40:00 2018-01-03 07:38:00 2018-01-04 09:39:00 2018-01-05 07:35:00       aki   3 2018-01-05 07:35:00  aki     postop
3  3 2018-01-02 18:15:00 2017-12-31 20:38:00 2018-01-01 14:43:00 2018-01-02 06:55:00     aki   aki   2 2018-01-01 14:43:00  aki      preop
4  4 2018-01-02 09:05:00 2017-12-31 22:43:00 2018-01-03 07:38:00                        NA                        
5  5 2018-01-03 11:40:00 2018-01-02 18:58:00 2018-01-04 11:16:00 2018-01-16 11:10:00   aki   aki   aki   1 2018-01-02 18:58:00  aki      preop

how to compare dates of two separate column only if another selected column fits a condition

Answers (1)

Related Questions