Reputation: 4930
In prospective studies, you want to summarize how old your sample is, over which years they were observed, and how long they were observed altogether. These collectively are consider the age, period, and cohort time-scales of the sample.
The easiest way to illustrate is with simulated data:
Suppose these data summarize a cohort of clinic patients with their baseline ages and start and stop date of observation:
set.seed(123)
n <- 10000
Obs <- data.frame(
'age' = sample(seq(40, 80, by=5), n, replace=T),
'start' = as.Date(n0 <- runif(n, 10000, 12000), origin="1970-01-01"),
'end' = as.Date(n0 + runif(n, 0, 3652.5), origin="1970-01-01")
)
I want a foo
to take vectors
AgeCut <- c(0, 65, Inf)
Yrcut <- c(0, 2000, Inf)
DurCut <- c(0, 5, Inf)
And cross tabulate the number of individuals who fall into each possible permutation of those values for at least one day. Or, even more complicated-ly, the number of years a person falls into a category. For instance, a person who is 40 when they enter the sample at 1990 and stay in for 30 years would be in the yt65/bf2000/lt5year category for 5 years when they enter yt65/bf2000/gt5year and stay there for another 5 years when they enter yt65/af2000/gt5year for 15 more years and finally ot65/af2000/gt5year
For some reason, this is wracking my brain so heavily I can't calculate the actual desired output, even via some inefficient for loop, but the format and structure would be something like:
AgeCut YrCut DurCut NumObs
1 younger than 65 before 2000 less than 5 years 1000
2 65 and older before 2000 less than 5 years 1000
3 younger than 65 2000 and later less than 5 years 1000
4 65 and older 2000 and later less than 5 years 1000
5 younger than 65 before 2000 5 or more years 1000
6 65 and older before 2000 5 or more years 1000
7 younger than 65 2000 and later 5 or more years 1000
8 65 and older 2000 and later 5 or more years 1000
Upvotes: 1
Views: 258
Reputation: 4930
OK I have this implementation in base R. It recursively evaluates the time spent in the current category until moving to the next one, adds that duration to the various counters and subtracts it from the overall duration of study participation, then feeds the updated times and durations into the apc
function.
apc <- function(times, cuts, dur, strata=1) {
class <- mapply(findInterval, times, cuts)
tnext <- mapply( ## times until next category
function(t, c, i) {c[i+1] - t},
times, cuts, as.data.frame(class)
)
mnext <- apply(tnext, 1, min, na.rm=T) ## minimum time to next category
mnext <- pmin(mnext, dur) ## truncate if duration exceeded before next
dur <- dur-mnext
times <- lapply(times, `+`, mnext)
if (all(dur == 0))
return(list(data.frame(class, 't'=mnext, strata)))
return(c(list(data.frame(class, 't'=mnext, strata)), apc(times, cuts, dur, strata=strata)))
}
This estimates the following number of person years in each category as:
> val
age start cohort strata t
1 1 1 1 1 3175.986
2 2 1 1 1 2582.793
3 1 2 1 1 17714.503
4 2 2 1 1 13972.134
5 1 2 2 1 5658.430
6 2 2 2 1 6957.702
which the sum (50,061.55) is equal to the sum of Obs$end-Obs$start
.
Upvotes: 1
Reputation: 206411
Using some tidyverse functions, I think you want something like this
library(tidyverse)
AgeCut <- c(0, 65, Inf)
Yrcut <- c(0, 2000, Inf)
DurCut <- c(0, 5, Inf)
Obs %>% transmute (
ageCat = cut(age, AgeCut, c("younger than 65 ","65 and older"), right=FALSE),
startCat = cut(year(start), Yrcut, c("before 2000", "2000 and later"), right=FALSE),
DurCut = cut(year(end)-year(start), DurCut, c("less than 5 years", "5 or more years"), right=FALSE)
) %>% table() %>% as_data_frame()
This returns
ageCat startCat DurCut n
<chr> <chr> <chr> <int>
1 younger than 65 before 2000 less than 5 years 1196
2 65 and older before 2000 less than 5 years 968
3 younger than 65 2000 and later less than 5 years 1312
4 65 and older 2000 and later less than 5 years 1015
5 younger than 65 before 2000 5 or more years 1503
6 65 and older before 2000 5 or more years 1185
7 younger than 65 2000 and later 5 or more years 1580
8 65 and older 2000 and later 5 or more years 1241
The cut()
function is doing most of the work here.
Upvotes: 1