maldini425
maldini425

Reputation: 317

Error with the “standard unambiguous date” for string-to-date conversion in R

So I am trying this code, which I have used in the past with other data wrangling tasks with no errors:

## Create an age_at_enrollment variable, based on the start_date per individual (i.e. I want to know an individual's age, when they began their healthcare job).

complete_dataset_1 = complete_dataset %>% mutate(age_at_enrollment = (as.Date(start_date)-as.Date(birth_date))/365.25)

However, I keep receiving this error message: "Error in charToDate(x) : character string is not in a standard unambiguous format"

I believe this error is happening because in the administrative dataset that I am using, the start_date and birth_date variables are formatted in an odd way:

start_date    birth_date
2/5/07 0:00   2/28/1992 0:00

I could not find an answer as to why the data is formatted that, so any thoughts on how to fix this issue without altering the original administrative dataset?

Upvotes: 1

Views: 266

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522161

The ambiguity in your call to as.Date is whether the day or month comes first. To resolve this, you may use the format parameter of as.Date:

complete_dataset_1 = complete_dataset
    %>% mutate(age_at_enrollment = (
        as.Date(start_date, format="%m/%d/%Y") -
        as.Date(birth_date, format="%m/%d/%Y")) / 365.25)

A more precise way to calculate the diff in years, handling the leap year edge case, would be to use the lubridate package:

library(lubridate)
complete_dataset_1 = complete_dataset
    %>% mutate(age_at_enrollment = time_length(difftime(
        as.Date(start_date, format="%m/%d/%Y"),
        as.Date(birth_date, format="%m/%d/%Y")), "years")

Upvotes: 2

Related Questions