T.McMillen
T.McMillen

Reputation: 137

How to count number of observations that were negative before receiving a positive result

I am working with some clinical data and I would like to count the number of tests someone had that were "Not Detected" before they had a "Detected" result and exclude anyone that never had a "Detected" result.

ID <- c(1,1,2,2,3,3,3,4)
Specimen_Type <- c("NP", "NP", "Throat", "Throat", "NP", "Throat", "Throat", "NP")
RESULT_VAL <- c("Not Detected", "Detected", "Not Detected", "Detected", "Not Detected", "Not Detected", "Detected", "Not Detected")
RESULT_DATE <- c("6-1-2020", "6-10-2020","6-1-2020", "6-10-2020","6-1-2020", "6-10-2020", "6-20-2020", "6-1-2020")
Data_sum<- data.frame(ID, Specimen_Type, RESULT_VAL, RESULT_DATE)

I would like the output to look like this (or something like this) because I would also like to see how many different specimen types were used for testing prior to the positive result.

ID     Number_of_tests_before_pos     Specimen_Type     
1            1                              NP
2            1                              Throat 
3            2                              NP,Throat

Upvotes: 2

Views: 56

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388982

You can first remove any ID that does not have 'Detected' in RESULT_VAL, count number of rows before 'Detected' and combine Specimen_Type into one-comma separated string.

library(dplyr)

Data_sum %>%
  group_by(ID) %>%
  filter(any(RESULT_VAL == 'Detected')) %>%
  summarise(n = match('Detected', RESULT_VAL) - 1, 
            Specimen_Type = toString(Specimen_Type[1:n]))

#    ID     n Specimen_Type
#  <dbl> <dbl> <chr>        
#1     1     1 NP           
#2     2     1 Throat       
#3     3     2 NP, Throat   

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101335

Here is a base R option

transform(
  aggregate(
    . ~ ID,
    cbind(
      subset(Data_sum,
        ave(RESULT_VAL == "Detected", ID, FUN = function(x) cumsum(x) == 0 & any(x)),
        select = c(ID, Specimen_Type)
      ),
      Number_of_tests_before_pos = 1
    ),
    c
  ),
  Number_of_tests_before_pos = lengths(Number_of_tests_before_pos)
)

giving

  ID Specimen_Type Number_of_tests_before_pos
1  1            NP                          1
2  2        Throat                          1
3  3    NP, Throat                          2

Upvotes: 2

akrun
akrun

Reputation: 887108

After grouping by 'ID', use match to get the index of the first occurrence of 'Detected' in 'RESULT_VAL' ('i1'), if there is no match, by default, returns NA, use that info to remove the IDs in filter, then we use summarise to get the count of 'Not Detected' in the subset of 'RESULT_VAL' until that index, similarly paste the unique elements of 'Specimen_Type' on the subset data

library(dplyr)
Data_sum %>% 
   group_by(ID) %>% 
   mutate(i1 = match('Detected', RESULT_VAL)) %>% 
   filter(complete.cases(i1)) %>%
   summarise(Number_of_tests_before_pos = sum(RESULT_VAL[seq(i1[1])] == 
          'Not Detected'),
          Specimen_Type = toString(unique(Specimen_Type[seq(i1[1])])),
           .groups = 'drop')

-output

# A tibble: 3 x 3
#     ID Number_of_tests_before_pos Specimen_Type
#  <dbl>                      <int> <chr>        
#1     1                          1 NP           
#2     2                          1 Throat       
#3     3                          2 NP, Throat   

Or we can can use cumsum on a logical vector to do the filtering

Data_sum %>% 
     group_by(ID) %>% 
     filter(cumsum(RESULT_VAL == 'Detected') < 1,
            any(RESULT_VAL == 'Detected')) %>%
     summarise(Number_of_tests_before_pos = sum(RESULT_VAL == 'Not Detected'),
               Specimen_Type = toString(unique(Specimen_Type)), .groups = 'drop')
# A tibble: 3 x 3
#     ID Number_of_tests_before_pos Specimen_Type
#  <dbl>                      <int> <chr>        
#1     1                          1 NP           
#2     2                          1 Throat       
#3     3                          2 NP, Throat   

Upvotes: 1

Related Questions