Milad Nazarzadeh
Milad Nazarzadeh

Reputation: 51

Counting and then summing string variable within specific time in long data frame

I have a dataset like this:

    structure(list(Participant_ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 
2, 2, 2, 2), group = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1), 
    stra_arm = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 6, 6, 6, 6), time = c(0, 
    0, 0, 0, 0, 6, 68, 102, 111, 0, 0, 0, 0), name_class = c("beta_bloker", 
    "ACE", "Thiazide_du", "alpha_bloker", "CCB", "alpha_bloker", 
    "CCB", "CCB", "CCB", "beta_bloker", "ACE", "loope_du", "pot_du"
    ), stop = c(NA, NA, NA, NA, NA, "Yes", "Yes", NA, NA, NA, 
    NA, NA, NA)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -13L), spec = structure(list(cols = list(
    Participant_ID = structure(list(), class = c("collector_double", 
    "collector")), group = structure(list(), class = c("collector_double", 
    "collector")), stra_arm = structure(list(), class = c("collector_double", 
    "collector")), time = structure(list(), class = c("collector_double", 
    "collector")), name_class = structure(list(), class = c("collector_character", 
    "collector")), stop = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))

In each time and within each participant, I want to count each value of column "name_class" and then sum them, with the condition that: minus 1 if value in column "stop" for the same value in "name_class" is yes, and plus 1 if the same value in "name_class" is NA. This is actually the number of drug class changes for each patient during follow-up time.

The final dataset will be something like this:enter image description here

Any idea is really appreciated

Upvotes: 3

Views: 72

Answers (1)

akrun
akrun

Reputation: 886948

Perhaps this helps

library(dplyr)
df1 %>% 
   mutate(count = replace(+(is.na(stop)), stop == "Yes", -1)) %>% 
   group_by(Participant_ID, group, stra_arm) %>%
   mutate(count = cumsum(count)) %>%
   group_by(time, .add = TRUE) %>%
   summarise(count = max(count), .groups = 'drop')

-output

# A tibble: 6 × 5
  Participant_ID group stra_arm  time count
           <dbl> <dbl>    <dbl> <dbl> <dbl>
1              1     2        3     0     5
2              1     2        3     6     4
3              1     2        3    68     3
4              1     2        3   102     4
5              1     2        3   111     5
6              2     1        6     0     4

Upvotes: 2

Related Questions