Reputation: 137
I am working with some clinical data and I would like to count the number of tests someone had that were "Not Detected" before they had a "Detected" result and exclude anyone that never had a "Detected" result.
ID <- c(1,1,2,2,3,3,3,4)
Specimen_Type <- c("NP", "NP", "Throat", "Throat", "NP", "Throat", "Throat", "NP")
RESULT_VAL <- c("Not Detected", "Detected", "Not Detected", "Detected", "Not Detected", "Not Detected", "Detected", "Not Detected")
RESULT_DATE <- c("6-1-2020", "6-10-2020","6-1-2020", "6-10-2020","6-1-2020", "6-10-2020", "6-20-2020", "6-1-2020")
Data_sum<- data.frame(ID, Specimen_Type, RESULT_VAL, RESULT_DATE)
I would like the output to look like this (or something like this) because I would also like to see how many different specimen types were used for testing prior to the positive result.
ID Number_of_tests_before_pos Specimen_Type
1 1 NP
2 1 Throat
3 2 NP,Throat
Upvotes: 2
Views: 56
Reputation: 388982
You can first remove any ID
that does not have 'Detected'
in RESULT_VAL
, count number of rows before 'Detected'
and combine Specimen_Type
into one-comma separated string.
library(dplyr)
Data_sum %>%
group_by(ID) %>%
filter(any(RESULT_VAL == 'Detected')) %>%
summarise(n = match('Detected', RESULT_VAL) - 1,
Specimen_Type = toString(Specimen_Type[1:n]))
# ID n Specimen_Type
# <dbl> <dbl> <chr>
#1 1 1 NP
#2 2 1 Throat
#3 3 2 NP, Throat
Upvotes: 1
Reputation: 101335
Here is a base R option
transform(
aggregate(
. ~ ID,
cbind(
subset(Data_sum,
ave(RESULT_VAL == "Detected", ID, FUN = function(x) cumsum(x) == 0 & any(x)),
select = c(ID, Specimen_Type)
),
Number_of_tests_before_pos = 1
),
c
),
Number_of_tests_before_pos = lengths(Number_of_tests_before_pos)
)
giving
ID Specimen_Type Number_of_tests_before_pos
1 1 NP 1
2 2 Throat 1
3 3 NP, Throat 2
Upvotes: 2
Reputation: 887108
After grouping by 'ID', use match
to get the index of the first occurrence of 'Detected' in 'RESULT_VAL' ('i1'), if there is no match, by default, returns NA
, use that info to remove the IDs in filter
, then we use summarise
to get the count of 'Not Detected' in the subset of 'RESULT_VAL' until that index, similarly paste
the unique
elements of 'Specimen_Type' on the subset data
library(dplyr)
Data_sum %>%
group_by(ID) %>%
mutate(i1 = match('Detected', RESULT_VAL)) %>%
filter(complete.cases(i1)) %>%
summarise(Number_of_tests_before_pos = sum(RESULT_VAL[seq(i1[1])] ==
'Not Detected'),
Specimen_Type = toString(unique(Specimen_Type[seq(i1[1])])),
.groups = 'drop')
-output
# A tibble: 3 x 3
# ID Number_of_tests_before_pos Specimen_Type
# <dbl> <int> <chr>
#1 1 1 NP
#2 2 1 Throat
#3 3 2 NP, Throat
Or we can can use cumsum
on a logical vector to do the filter
ing
Data_sum %>%
group_by(ID) %>%
filter(cumsum(RESULT_VAL == 'Detected') < 1,
any(RESULT_VAL == 'Detected')) %>%
summarise(Number_of_tests_before_pos = sum(RESULT_VAL == 'Not Detected'),
Specimen_Type = toString(unique(Specimen_Type)), .groups = 'drop')
# A tibble: 3 x 3
# ID Number_of_tests_before_pos Specimen_Type
# <dbl> <int> <chr>
#1 1 1 NP
#2 2 1 Throat
#3 3 2 NP, Throat
Upvotes: 1