Reputation: 13
I have a tibble in R with about 2,000 rows. It was imported from Excel using read_excel. One of the fields is a date field: dob. It imported as a string, and has dates in three formats:
"YYYY-MM-DD"
"DD-MM-YYYY"
"XXXXX" (ie, a five-digit Excel-style date)
Let's say I treat the column as a vector.
dob <- c("1969-02-02", "1986-05-02", "34486", "1995-09-05", "1983-06-05",
"1981-02-01", "30621", "01-05-1986")
I can see that I probably need a solution that uses both parse_date_time and as.Date.
If I use parse_date_time:
dob_fixed <- parse_date_time(dob, c("ymd", "dmy"))
This fixes them all, except the five-digit one, which returns NA.
I can fix the five-digit one, by using as.integer and as.Date:
dob_fixed2 <- as.Date(as.integer(dob), origin = "1899-12-30")
Ideally I would run one and then the other, but because each returns NA on the strings that don't work I can't do that.
Any suggestions for doing all? I could simply change them in Excel and re-import, but I feel like that's cheating!
Upvotes: 1
Views: 132
Reputation: 887951
We create a logical index after the first run based on the NA
values and use that to index for the second run
i1 <- is.na(dob_fixed)
dob_fixed[i1] <- as.Date(as.integer(dob[i1]), origin = "1899-12-30")
Upvotes: 1