Tom O
Tom O

Reputation: 1597

Conditionally calculate difference between multiple rows

I've been struggling to conditionally calculate a time difference across multiple rows/columns. I would like to calculate by id the time difference between an initial negative result and a subsequent positive. I've been trying to do this within dplyr but maybe I need another approach.

Code for data :

id <- c(1,1,1,1,2,2,2,2,3,3,3,3)
firstnegative <-c('T', 'F','F','F', 'F','F','F','F','T','F','F','F') 
organism <- c('neg', 'COVID', 'COVID', 'neg', 'COVID', 'neg', 'neg', 'neg', 'neg', 'neg', 'COVID', 'COVID')
date <- seq(as.Date("2020/3/1"), as.Date('2020/3/12'), "days")
data <- data.frame (id,date, organism, firstnegative)

 id       date organism  firstnegative
  1 2020-03-01      neg             T
  1 2020-03-02    COVID             F
  1 2020-03-03    COVID             F
  1 2020-03-04      neg             F
  2 2020-03-05    COVID             F
  2 2020-03-06      neg             F
  2 2020-03-07      neg             F
  2 2020-03-08      neg             F
  3 2020-03-09      neg             T
  3 2020-03-10      neg             F
  3 2020-03-11    COVID             F
  3 2020-03-12    COVID             F

Expected Result

 id       date organism  firstnegative   timediff
  1 2020-03-01      neg             T      1d
  1 2020-03-02    COVID             F
  1 2020-03-03    COVID             F
  1 2020-03-04      neg             F
  2 2020-03-05    COVID             F
  2 2020-03-06      neg             F
  2 2020-03-07      neg             F
  2 2020-03-08      neg             F
  3 2020-03-09      neg             T      2d
  3 2020-03-10      neg             F
  3 2020-03-11    COVID             F
  3 2020-03-12    COVID             F

The length between the first negative and subsequent positives is unknown and changes. I can't assume a lead of 1.
Any thoughts/approaches would be greatly appreciated.

Upvotes: 0

Views: 98

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

For each id you can find the first date of 'COVID' and subtract it with first time 'T' was present in firstnegative. Since we want value only on first row we replace rest of the values with NA.

library(dplyr)

data %>%
  group_by(id) %>%
  mutate(timediff = date[match('COVID', organism)] - 
                    date[match('T', firstnegative)],
         timediff = as.numeric(replace(timediff, -1L, NA)))

#      id date       organism firstnegative timediff
#   <dbl> <date>     <chr>    <chr>            <dbl>
# 1     1 2020-03-01 neg      T                    1
# 2     1 2020-03-02 COVID    F                   NA
# 3     1 2020-03-03 COVID    F                   NA
# 4     1 2020-03-04 neg      F                   NA
# 5     2 2020-03-05 COVID    F                   NA
# 6     2 2020-03-06 neg      F                   NA
# 7     2 2020-03-07 neg      F                   NA
# 8     2 2020-03-08 neg      F                   NA
# 9     3 2020-03-09 neg      T                    2
#10     3 2020-03-10 neg      F                   NA
#11     3 2020-03-11 COVID    F                   NA
#12     3 2020-03-12 COVID    F                   NA

Upvotes: 1

Related Questions