Adam
Adam

Reputation: 453

Complex sorting algorithm in R

I have the following data:

data
# A tibble: 960 x 8
# Groups:   SUBJ_ID, READER [122]
   SUBJ_ID    VISIT          DOS        MOS_DUR   SPD PERCDIFF_SPD SPD_DIFF READER
   <chr>      <chr>          <date>       <dbl> <dbl>        <dbl>    <dbl> <fct> 
 1 1001-31169 1 Screening    2012-05-09       0 1846.         0         0   1     
 2 1001-31169 1 Week 04      2012-06-06       0 1659.       -10.1    -187.  1     
 3 1001-31169 1 Week 08      2012-07-09       2 1924.         4.26     78.5 1     
 4 1001-31169 1 Week 16      2012-08-27       3 1914.         3.74     69.0 1     
 5 1001-31169 Unscheduled 01 2012-10-22       5 2094.        13.5     249.  1     
 6 1001-31169 Unscheduled 02 2012-12-17       7 1890.         2.44     44.9 1     
 7 1001-31169 Unscheduled 03 2013-02-11       9 1370.       -25.8    -476.  1     
 8 1001-31169 Unscheduled 04 2013-04-10      11  986.       -46.6    -860.  1     
 9 1001-31169 Unscheduled 05 2013-07-15      14  570.       -69.1   -1275.  1     
10 1001-31169 Unscheduled 06 2013-11-01      17  349.       -81.1   -1497.  1     
# ... with 950 more rows
#Note, PERCDIFF_SPD is difference from baseline/screening

str(data)
tibble [960 x 8] (S3: grouped_df/tbl_df/tbl/data.frame)
 $ Visit Name  : chr [1:960] "1 Screening" "1 Week 04" "1 Week 08" "1 Week 16" ...
 $ Subject ID  : chr [1:960] "1001-31169" "1001-31169" "1001-31169" "1001-31169" ...
 $ DOS         : Date[1:960], format: "2012-05-09" "2012-06-06" "2012-07-09" "2012-08-27" ...
 $ Reader Name : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ SPD         : num [1:960] 1846 1659 1924 1914 2094 ...
 $ MOS_DUR     : num [1:960] 0 0 2 3 5 7 9 11 14 17 ...
 $ PERCDIFF_SPD: num [1:960] 0 -10.12 4.26 3.74 13.48 ...
 $ SPD_DIFF    : num [1:960] 0 -186.7 78.5 69 248.7 ...
 - attr(*, "groups")= tibble [122 x 3] (S3: tbl_df/tbl/data.frame)
  ..$ Subject ID : chr [1:122] "1001-31169" "1001-31169" "1002-31169" "1002-31169" ...
  ..$ Reader Name: Factor w/ 5 levels "1","2","3","4",..: 1 4 2 4 4 5 1 3 1 4 ...
  ..$ .rows      : list<int> [1:122] 
#grouped rows redacted for ease of presentation

First, I want to set the nadir SPD value to the SPD at the screening visit.

Then I want to iterate through the SPD values comparing the current SPD value to the nadir SPD value, set the current SPD as the new nadir SPD if current < nadir, and also assign 1 of 4 response values depending on the % difference, which are listed below:

>=25% assign as "Progressive Disease" (abbreviated PD)

> -50% & <25% assign as "Stable Disease" (abbreviated SD)

> -100% & <= -50% assign as "Partial Response" (abbreviated PR)

== -100% assign as "Complete Response" (abbreviated CR)

I want to do this across each reader for each subject, with one caveat. Once a subject has "PD" they should be marked as "PD" for every subsequent reading.

For example, for subject 1001-31169, the SPD nadir would be initially set to 1846 for reader 1. The response for the screening visit should be set to "SD".

Then, for the next SPD measurement, the SPD current would be set to 1659 and a response of "SD" would be assigned since the % difference (-10.1%) is > -50% & <25%.

Since the current SPD (1659) is lower than the current nadir (1846), the current SPD (1659) would now be set as the nadir SPD and compared to the next measurement (1924).

1924>1659, so the nadir SPD would not be reset and a response of "SD" would be assigned, since the % diff is 16%.

I have tried to use for loops to no avail. Here are 2 other posts on the same topic if you want additional background (Post 1; Post 2).

I have also provided a play dataset below with "known" responses to troubleshoot the code:

SUBJ_ID <- c("1","1","1","1","1","1","1","1","1","1","1","1","2","2","2","2","2","2","2","2","2","2","2","2") 
VISIT <- c("Screening", "Week 1", "Week 2", "Week 3", "Week 4", "Week 5", "Screening", "Week 1", "Week 2", "Week 3", "Week 4", "Week 5", "Screening", "Week 1", "Week 2", "Week 3", "Week 4", "Week 5", "Screening", "Week 1", "Week 2", "Week 3", "Week 4", "Week 5")
SPD <- c(100, 120, 90, 80, 60, 100, 100, 130, 90, 80, 60, 100, 50, 20, 0, 0, 0, 30, 60, 40, 20, 0, 0, 40)
READER <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2)
KNOWNRESPONSE <- c("SD", "SD", "SD", "SD", "PR", "PD", "SD", "PD", "PD", "PD", "PD", "PD", "SD", "PR", "CR", "CR", "CR", "PD", "SD", "SD", "SD", "CR", "CR", "PD")

data <- data.frame(SUBJ_ID, VISIT, SPD, READER, KNOWNRESPONSE) 
data

> data
   SUBJ_ID     VISIT SPD READER KNOWNRESPONSE
1        1 Screening 100      1            SD
2        1    Week 1 120      1            SD
3        1    Week 2  90      1            SD
4        1    Week 3  80      1            SD
5        1    Week 4  60      1            PR
6        1    Week 5 100      1            PD
7        1 Screening 100      2            SD
8        1    Week 1 130      2            PD
9        1    Week 2  90      2            PD
10       1    Week 3  80      2            PD
11       1    Week 4  60      2            PD
12       1    Week 5 100      2            PD
13       2 Screening  50      1            SD
14       2    Week 1  20      1            PR
15       2    Week 2   0      1            CR
16       2    Week 3   0      1            CR
17       2    Week 4   0      1            CR
18       2    Week 5  30      1            PD
19       2 Screening  60      2            SD
20       2    Week 1  40      2            SD
21       2    Week 2  20      2            SD
22       2    Week 3   0      2            CR
23       2    Week 4   0      2            CR
24       2    Week 5  40      2            PD

Upvotes: 0

Views: 33

Answers (1)

Jon Spring
Jon Spring

Reputation: 66500

library(dplyr)
data %>%
  group_by(SUBJ_ID, READER) %>%
  mutate(SPD_cuml_min = cummin(SPD),
         vs_prior_min = (SPD / lag(SPD_cuml_min)) - 1,
         RESP2 = case_when(
           vs_prior_min > 0.25 ~ "PD",
           vs_prior_min > -0.5 ~ "SD",
           vs_prior_min > -1 ~ "PR",
           vs_prior_min == -1 ~ "CR"
         ),
         RESP2 = coalesce(KNOWNRESPONSE, RESP2)) %>%
group_by(SUBJ_ID) %>%
  mutate(RESP3 = if_else(
    cumsum(RESP2 == "PD") > 0,
    "PD", "SD")) %>%
  ungroup()

Result

# A tibble: 24 x 9
   SUBJ_ID VISIT       SPD READER KNOWNRESPONSE SPD_cuml_min vs_prior_min RESP2 RESP3
   <chr>   <chr>     <dbl>  <dbl> <chr>                <dbl>        <dbl> <chr> <chr>
 1 1       Screening   100      1 SD                     100       NA     SD    SD   
 2 1       Week 1      120      1 SD                     100        0.200 SD    SD   
 3 1       Week 2       90      1 SD                      90       -0.100 SD    SD   
 4 1       Week 3       80      1 SD                      80       -0.111 SD    SD   
 5 1       Week 4       60      1 PR                      60       -0.25  PR    SD   
 6 1       Week 5      100      1 PD                      60        0.667 PD    PD   
 7 1       Screening   100      2 SD                     100       NA     SD    PD   
 8 1       Week 1      130      2 PD                     100        0.3   PD    PD   
 9 1       Week 2       90      2 PD                      90       -0.100 PD    PD   
10 1       Week 3       80      2 PD                      80       -0.111 PD    PD 

Upvotes: 1

Related Questions