wdefreit
wdefreit

Reputation: 73

Creating Indicator from Index Date in R

I have a dataframe structured like this

df <- tribble(
  ~pat, ~drug, ~`2023-01-01`, ~`2023-01-02`, ~`2023-01-03`, ~`2023-01-04`, ~`2023-01-05`,
  "1", "tylenol",  3, 0, 0, 1, 0,
  "1", "advil",    0, 2, 0, 0, 0,
  "2", "tylenol",  0, 0, 3, 0, 0)

I would like to take the values under the respective dates and use them as rowise indicators so my dataframe looks like this. The key is that the non-0 values will be used to populate a rowwise indicator based on its value (i.e. 3 means an indicator for that day, plus two more days).

df2 <- tribble(
  ~pat, ~drug, ~`2023-01-01`, ~`2023-01-02`, ~`2023-01-03`, ~`2023-01-04`, ~`2023-01-05`,
  "1", "tylenol",  1, 1, 1, 1, 0,
  "1", "advil",    0, 1, 1, 0, 0,
  "2", "tylenol",  0, 0, 1, 1, 1)

I would like to output a dataframe with one patient per row that looks like this.

df3 <- tribble(
  ~pat, ~`2023-01-01`, ~`2023-01-02`, ~`2023-01-03`, ~`2023-01-04`, ~`2023-01-05`,
  "1", 1, 1, 1, 1, 0,
  "2", 0, 0, 1, 1, 1)

The goal is to create a df with one row per patient that uses days supply of multiple drugs to create indicators for the proportion of days covered in R (preferably using tidyverse). I have >600k patients, >20 different drugs, and 2.5years worth of data. I have been struggling with this for a while. Can someone please help?? Thank you.

Upvotes: 2

Views: 105

Answers (1)

Andy Baxter
Andy Baxter

Reputation: 7626

A possible solution would be pivoting your dataframe and using the values to "spread" 1s down the column until they reach 0. Then summarise to note any days of any drugs.

As untidy as it is to use a for loop in R I think it's the only way here, as calculation of each subsequent value in the vector relies on completion of the previous calculation. A function (here cumlag) can at least do this in a tidy way:

library(tidyverse)

df <- tribble(
  ~pat, ~drug, ~`2023-01-01`, ~`2023-01-02`, ~`2023-01-03`, ~`2023-01-04`, ~`2023-01-05`,
  "1", "tylenol",  3, 0, 0, 1, 0,
  "1", "advil",    0, 2, 0, 0, 0,
  "2", "tylenol",  0, 0, 3, 0, 0)

cumlag <- function(vals) {
  
  out_len <- length(vals)
  
  for (v in 1:out_len) {
    vals[v + 1] <- max(vals[v + 1], vals[v] - 1)
    vals[v] <- min(vals[v], as.logical(vals[v]))
  }
  
  vals[1:out_len]
  
}

df |> 
  pivot_longer(-c(pat, drug), names_to = "date", values_to = "value") |> 
  group_by(pat, drug) |> 
  mutate(value = cumlag(value)) |> 
  group_by(pat, date) |> 
  summarise(value = max(value)) |> 
  pivot_wider(names_from = date, values_from = value)
#> `summarise()` has grouped output by 'pat'. You can override using the `.groups`
#> argument.
#> # A tibble: 2 × 6
#> # Groups:   pat [2]
#>   pat   `2023-01-01` `2023-01-02` `2023-01-03` `2023-01-04` `2023-01-05`
#>   <chr>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
#> 1 1                1            1            1            1            0
#> 2 2                0            0            1            1            1

Upvotes: 1

Related Questions