AMG
AMG

Reputation: 43

R - calculate daily/weekly rate with changing denominator

I'm trying to calculate a daily/weekly prevalence rate for a condition, but the sample size in the denominator varies over time. I have a dataset that includes the date on which each subject entered and left the sample (e.g. birth/death dates), the date on which each subject contracted the condition if applicable, and some demographic characteristics.

  1. How can I calculate the total number of people who were in the sample by day (or by week)?
  2. How can I calculate the daily (or weekly) prevalence rate of the condition given this changing denominator?
  3. For the purposes of statistical inference (e.g. assessing whether the condition's prevalence varies before/after a certain date, controlling for demographic characteristics), how can I include the demographic information in the output generated from #1 and #2?

Example data:

ex <- data.frame(
  id=seq(1:10),
  birth=as.Date(c("12/01/2020", "12/01/2020", "12/01/2020", "12/02/2020", "12/02/2020",
                  "12/02/2020", "12/03/2020", "12/04/2020", "12/04/2020", "12/04/2020")),
  sick=as.Date(c("12/03/2020", "12/04/2020", "12/02/2020", "12/03/2020", "12/06/2020",
                 NA, "12/06/2020", "12/07/2020", "12/09/2020", NA)),
  death=as.Date(c("12/05/2020", "12/05/2020", "12/04/2020", "12/08/2020", "12/07/2020",
                  NA, "12/07/2020", "12/09/2020", "12/10/2020", NA)),
  gender=c("male", "male", "female", "female", "female", "male", "female", "male", "male", "male")
)

Desired output:

sick <- data.frame(
  date=c("12/01/2020", "12/02/2020", "12/03/2020", "12/04/2020", "12/05/2020",
           "12/06/2020", "12/07/2020", "12/08/2020", "12/09/2020", "12/10/2020"),
  count_alive=c(3, 6, 7, 9, 7, 7, 5, 4, 3, 2),
  count_sick=c(0, 1, 3, 4, 1, 4, 2, 1, 1, 0)
)

sick$pct_sick <- sick$count_sick/sick$count_alive*100

Upvotes: 0

Views: 114

Answers (1)

mkpt_uk
mkpt_uk

Reputation: 256

Using the tidyverse my strategy was to create a dummy logical variable for whether the patient was sick and whether they were alive on a particular date. I then created a function that summarised the number of sick and alive patients on that date. Finally I mapped the function over all the dates of interest.

library(tidyverse)

ex <- data.frame(
  id=seq(1:10),
  birth=as.Date(c("12/01/2020", "12/01/2020", "12/01/2020", "12/02/2020", "12/02/2020",
                  "12/02/2020", "12/03/2020", "12/04/2020", "12/04/2020", "12/04/2020"), "%d/%m/%Y"),
  sick=as.Date(c("12/03/2020", "12/04/2020", "12/02/2020", "12/03/2020", "12/06/2020",
                 NA, "12/06/2020", "12/07/2020", "12/09/2020", NA), "%d/%m/%Y"),
  death=as.Date(c("12/05/2020", "12/05/2020", "12/04/2020", "12/08/2020", "12/07/2020",
                  NA, "12/07/2020", "12/09/2020", "12/10/2020", NA), "%d/%m/%Y"),
  gender=c("male", "male", "female", "female", "female", "male", "female", "male", "male", "male")
)

d <- tibble(date = as.Date(c("12/01/2020", "12/02/2020", "12/03/2020", "12/04/2020" ,
                             "12/05/2020", "12/06/2020", "12/07/2020", "12/08/2020",
                             "12/09/2020", "12/10/2020"), "%d/%m/%Y") )

smry <- function(d) {
  ex %>% mutate(alive_now = (death >= d | is.na(death) ),
               sick_now = (sick <= d & !is.na(sick))) %>% 
    summarise(count_alive = sum(alive_now),
              count_sick = sum(sick_now))
}

d %>% mutate(s = map(date, smry)) %>% unnest(s)

This gave:

# A tibble: 10 x 3
   date       count_alive count_sick
   <date>           <int>      <int>
 1 2020-01-12          10          0
 2 2020-02-12          10          1
 3 2020-03-12          10          3
 4 2020-04-12          10          4
 5 2020-05-12           9          4
 6 2020-06-12           7          6
 7 2020-07-12           7          7
 8 2020-08-12           5          7
 9 2020-09-12           4          8
10 2020-10-12           3          8

This is not a full answer but it should start you off.

An alternative strategy without the map is to have one copy of the data associated with each date. You can then group_by the date and summarise.

d <- d %>% mutate(dd = list(ex)) %>% unnest(dd) %>% 
  mutate(alive_now = (death >= date| is.na(death) ),
         sick_now = (sick <= date & !is.na(sick)))

d %>% group_by(date) %>% 
  summarise(count_alive = sum(alive_now),
                count_sick = sum(sick_now))

Upvotes: 1

Related Questions