sd3184
sd3184

Reputation: 69

How do I figure out month and year of birth from data that includes multiple observations and age in R

I have data that includes multiple observations of the 1000's of people over many years. These data include the age at the time of observation. The data looks like this:

Person A | 2016-01-0 | 35
Person A | 2016-01-3 | 35
Person A | 2016-01-8 | 36

In this case, you'd be able to tell that Person A was born between 1980-01-04 and 1980-01-08 I'd like to be able to capture the date of birth (the month and year of birth will also do - even just the year of birth is fine) for all people in the data. Not sure, however, how to go about doing this without manually looking through and seeing where the age changes and taking note of the corresponding date.

Any help is greatly appreciated.

Upvotes: 1

Views: 72

Answers (2)

Jon Spring
Jon Spring

Reputation: 66520

Each row gives us one possible range for that person. If you take the latest "min_DOB" and the earliest "max_DOB" of those rows for each person, you can get the kind of narrowed-down range you described.

library(dplyr); library(lubridate)
df %>%
  mutate(min_DOB = V2 %>% ymd - years(V3) - 365, # dragons b/c leap years; 
         max_DOB = V2 %>% ymd - years(V3)) %>%
  group_by(V1) %>%
  summarize(min_DOB = max(min_DOB),
            max_DOB = min(max_DOB))

# A tibble: 1 x 3
  V1       min_DOB    max_DOB   
  <chr>    <date>     <date>    
1 Person A 1980-01-04 1980-01-08
data
df <- data.frame(
  stringsAsFactors = FALSE,
                V1 = c("Person A", "Person A", "Person A"),
                V2 = c("2016-01-01", "2016-01-03", "2016-01-08"),
                V3 = c(35L, 35L, 36L)
)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389047

You may extract the year by removing everything after the first - and subtract the age from it.

df <- transform(df, yob = as.numeric(sub('-.*', '', V2)) - V3)

#        V1        V2 V3  yob
#1 Person A 2016-01-0 35 1981
#2 Person A 2016-01-3 35 1981
#3 Person A 2016-01-8 36 1980

data

df <- structure(list(V1 = c("Person A", "Person A", "Person A"), V2 = c("2016-01-0", 
"2016-01-3", "2016-01-8"), V3 = c(35L, 35L, 36L)), 
row.names = c(NA, -3L), class = "data.frame")

Upvotes: 0

Related Questions