Reputation: 317
So I am trying this code, which I have used in the past with other data wrangling tasks with no errors:
## Create an age_at_enrollment variable, based on the start_date per individual (i.e. I want to know an individual's age, when they began their healthcare job).
complete_dataset_1 = complete_dataset %>% mutate(age_at_enrollment = (as.Date(start_date)-as.Date(birth_date))/365.25)
However, I keep receiving this error message: "Error in charToDate(x) : character string is not in a standard unambiguous format"
I believe this error is happening because in the administrative dataset that I am using, the start_date and birth_date variables are formatted in an odd way:
start_date birth_date
2/5/07 0:00 2/28/1992 0:00
I could not find an answer as to why the data is formatted that, so any thoughts on how to fix this issue without altering the original administrative dataset?
Upvotes: 1
Views: 266
Reputation: 522161
The ambiguity in your call to as.Date
is whether the day or month comes first. To resolve this, you may use the format
parameter of as.Date
:
complete_dataset_1 = complete_dataset
%>% mutate(age_at_enrollment = (
as.Date(start_date, format="%m/%d/%Y") -
as.Date(birth_date, format="%m/%d/%Y")) / 365.25)
A more precise way to calculate the diff in years, handling the leap year edge case, would be to use the lubridate
package:
library(lubridate)
complete_dataset_1 = complete_dataset
%>% mutate(age_at_enrollment = time_length(difftime(
as.Date(start_date, format="%m/%d/%Y"),
as.Date(birth_date, format="%m/%d/%Y")), "years")
Upvotes: 2