Reputation: 69
I have data that includes multiple observations of the 1000's of people over many years. These data include the age at the time of observation. The data looks like this:
Person A | 2016-01-0 | 35
Person A | 2016-01-3 | 35
Person A | 2016-01-8 | 36
In this case, you'd be able to tell that Person A was born between 1980-01-04 and 1980-01-08 I'd like to be able to capture the date of birth (the month and year of birth will also do - even just the year of birth is fine) for all people in the data. Not sure, however, how to go about doing this without manually looking through and seeing where the age changes and taking note of the corresponding date.
Any help is greatly appreciated.
Upvotes: 1
Views: 72
Reputation: 66520
Each row gives us one possible range for that person. If you take the latest "min_DOB" and the earliest "max_DOB" of those rows for each person, you can get the kind of narrowed-down range you described.
library(dplyr); library(lubridate)
df %>%
mutate(min_DOB = V2 %>% ymd - years(V3) - 365, # dragons b/c leap years;
max_DOB = V2 %>% ymd - years(V3)) %>%
group_by(V1) %>%
summarize(min_DOB = max(min_DOB),
max_DOB = min(max_DOB))
# A tibble: 1 x 3
V1 min_DOB max_DOB
<chr> <date> <date>
1 Person A 1980-01-04 1980-01-08
df <- data.frame(
stringsAsFactors = FALSE,
V1 = c("Person A", "Person A", "Person A"),
V2 = c("2016-01-01", "2016-01-03", "2016-01-08"),
V3 = c(35L, 35L, 36L)
)
Upvotes: 1
Reputation: 389047
You may extract the year by removing everything after the first -
and subtract the age from it.
df <- transform(df, yob = as.numeric(sub('-.*', '', V2)) - V3)
# V1 V2 V3 yob
#1 Person A 2016-01-0 35 1981
#2 Person A 2016-01-3 35 1981
#3 Person A 2016-01-8 36 1980
data
df <- structure(list(V1 = c("Person A", "Person A", "Person A"), V2 = c("2016-01-0",
"2016-01-3", "2016-01-8"), V3 = c(35L, 35L, 36L)),
row.names = c(NA, -3L), class = "data.frame")
Upvotes: 0