In R, use two date columns to calculate the average age of ID's at first event

Question

Background

I've got an R dataframe, d:

d <- data.frame(ID = c("a","a","b","b", "c","c","c"),
                birthdate = as.Date(c("1980-01-01","1980-01-01","2000-12-23","2000-12-23","1949-03-14","1949-03-14","1949-03-14")),
                event_date = as.Date(c("2011-01-01","2012-08-21","2011-12-23","2011-12-31","2013-03-14","2013-04-07","2014-07-14")),
                stringsAsFactors=FALSE)

It consists of an ID code and two dates: a birthdate and an event_date. Everyone's got a consistent birthdate, but people have multiple events each, all of these occurring on different dates.

The Problem

I'm trying to calculate the average age of people (IDs) in d at their first event. In other words, I'd like to get R to calculate an "age at the first event" by subtracting each ID's first event from their birthdate, and then summing them and dividing by n (3, in this case).

The answer (if my arithmetic isn't too far off this late at night) should be ~35.3 years old.

What I've tried

I'm not too familiar with date work in R, so I've only gotten so far as mutating a new column that calculates the difference between event_date and birthdate for that row:

d <- d %>% 
  mutate(date_difference = (event_date-birthdate)/365)

But I'm still a ways away from my summary calculation. I'm mainly hung up on how to tell R to find the first date difference for each ID. (Not to mention that dividing by 365 gives me correct years but they're still labeled "days" in the resulting df.)

dipetkov · Accepted Answer

As an alternative to dividing by 365, you can use the lubridate::time_length function. It computes the length of a period in different time units (seconds, minutes, days, years).

library("tidyverse")

dat <- data.frame(
  ID = c("a", "a", "b", "b", "c", "c", "c"),
  birthdate = as.Date(c("1980-01-01", "1980-01-01", "2000-12-23", "2000-12-23", "1949-03-14", "1949-03-14", "1949-03-14")),
  event_date = as.Date(c("2011-01-01", "2012-08-21", "2011-12-23", "2011-12-31", "2013-03-14", "2013-04-07", "2014-07-14")),
  stringsAsFactors = FALSE
)

dat_with_age <- dat %>%
  group_by(ID) %>%
  slice_min(
    event_date,
    n = 1
  ) %>%
  ungroup() %>%
  mutate(
    age_at_first_event = lubridate::time_length(event_date - birthdate, unit = "year")
  )
dat_with_age
#> # A tibble: 3 × 4
#>   ID    birthdate  event_date age_at_first_event
#>                           
#> 1 a     1980-01-01 2011-01-01               31.0
#> 2 b     2000-12-23 2011-12-23               11.0
#> 3 c     1949-03-14 2013-03-14               64

dat_with_age %>%
  summarise(
    mean(age_at_first_event)
  )
#> # A tibble: 1 × 1
#>   `mean(age_at_first_event)`
#>                        
#> 1                       35.3

^{Created on 2022-03-11 by the reprex package (v2.0.1)}

In R, use two date columns to calculate the average age of ID's at first event

Answers (2)

Related Questions

In R, use two date columns to calculate the average age of ID&#39;s at first event

Answers (2)

Related Questions

In R, use two date columns to calculate the average age of ID's at first event