azul
azul

Reputation: 13

Parsing dates in R from strings with multiple formats

I have a tibble in R with about 2,000 rows. It was imported from Excel using read_excel. One of the fields is a date field: dob. It imported as a string, and has dates in three formats:

"YYYY-MM-DD"
"DD-MM-YYYY"
"XXXXX"  (ie, a five-digit Excel-style date)

Let's say I treat the column as a vector.

dob <- c("1969-02-02", "1986-05-02", "34486", "1995-09-05", "1983-06-05",
         "1981-02-01", "30621", "01-05-1986")

I can see that I probably need a solution that uses both parse_date_time and as.Date.

If I use parse_date_time:

dob_fixed <- parse_date_time(dob, c("ymd", "dmy"))

This fixes them all, except the five-digit one, which returns NA.

I can fix the five-digit one, by using as.integer and as.Date:

dob_fixed2 <- as.Date(as.integer(dob), origin = "1899-12-30")

Ideally I would run one and then the other, but because each returns NA on the strings that don't work I can't do that.

Any suggestions for doing all? I could simply change them in Excel and re-import, but I feel like that's cheating!

Upvotes: 1

Views: 132

Answers (1)

akrun
akrun

Reputation: 887951

We create a logical index after the first run based on the NA values and use that to index for the second run

i1 <- is.na(dob_fixed)
dob_fixed[i1] <- as.Date(as.integer(dob[i1]), origin = "1899-12-30")

Upvotes: 1

Related Questions