Conditionally calculate difference between multiple rows

Question

I've been struggling to conditionally calculate a time difference across multiple rows/columns. I would like to calculate by id the time difference between an initial negative result and a subsequent positive. I've been trying to do this within dplyr but maybe I need another approach.

Code for data :

id <- c(1,1,1,1,2,2,2,2,3,3,3,3)
firstnegative <-c('T', 'F','F','F', 'F','F','F','F','T','F','F','F') 
organism <- c('neg', 'COVID', 'COVID', 'neg', 'COVID', 'neg', 'neg', 'neg', 'neg', 'neg', 'COVID', 'COVID')
date <- seq(as.Date("2020/3/1"), as.Date('2020/3/12'), "days")
data <- data.frame (id,date, organism, firstnegative)

 id       date organism  firstnegative
  1 2020-03-01      neg             T
  1 2020-03-02    COVID             F
  1 2020-03-03    COVID             F
  1 2020-03-04      neg             F
  2 2020-03-05    COVID             F
  2 2020-03-06      neg             F
  2 2020-03-07      neg             F
  2 2020-03-08      neg             F
  3 2020-03-09      neg             T
  3 2020-03-10      neg             F
  3 2020-03-11    COVID             F
  3 2020-03-12    COVID             F

Expected Result

 id       date organism  firstnegative   timediff
  1 2020-03-01      neg             T      1d
  1 2020-03-02    COVID             F
  1 2020-03-03    COVID             F
  1 2020-03-04      neg             F
  2 2020-03-05    COVID             F
  2 2020-03-06      neg             F
  2 2020-03-07      neg             F
  2 2020-03-08      neg             F
  3 2020-03-09      neg             T      2d
  3 2020-03-10      neg             F
  3 2020-03-11    COVID             F
  3 2020-03-12    COVID             F

The length between the first negative and subsequent positives is unknown and changes. I can't assume a lead of 1.
Any thoughts/approaches would be greatly appreciated.

Ronak Shah · Accepted Answer

For each id you can find the first date of 'COVID' and subtract it with first time 'T' was present in firstnegative. Since we want value only on first row we replace rest of the values with NA.

library(dplyr)

data %>%
  group_by(id) %>%
  mutate(timediff = date[match('COVID', organism)] - 
                    date[match('T', firstnegative)],
         timediff = as.numeric(replace(timediff, -1L, NA)))

#      id date       organism firstnegative timediff
#                         
# 1     1 2020-03-01 neg      T                    1
# 2     1 2020-03-02 COVID    F                   NA
# 3     1 2020-03-03 COVID    F                   NA
# 4     1 2020-03-04 neg      F                   NA
# 5     2 2020-03-05 COVID    F                   NA
# 6     2 2020-03-06 neg      F                   NA
# 7     2 2020-03-07 neg      F                   NA
# 8     2 2020-03-08 neg      F                   NA
# 9     3 2020-03-09 neg      T                    2
#10     3 2020-03-10 neg      F                   NA
#11     3 2020-03-11 COVID    F                   NA
#12     3 2020-03-12 COVID    F                   NA

Conditionally calculate difference between multiple rows

Answers (1)

Related Questions