Santosh M.
Santosh M.

Reputation: 2454

Extracting year from two different date format

I have column say x which has two different date formats 12/31/1998 and 12/--/98. As you can see, in the second format date is missing and year is in 2 digits.

I need to extract year from all the dates in my column. So, when I am using Year<- data.frame(format(df$x, "%Y")) it returning year for first format. For second format, it is returning NA.

I would appreciate all the help. Thanks.

Upvotes: 0

Views: 185

Answers (2)

dayne
dayne

Reputation: 7774

If they are all in the format where the year is the last number after "/" you can use basename. Then you just need to covert the 2 character years to a four year format:

vals <- c("12/31/1998", "12/--/98", "68", "69")
yrs <- basename(vals)
yrs <- ifelse(nchar(yrs) == 2, format(as.Date(yrs, format = "%y"), "%Y"), yrs)
yrs
# [1] "1998" "1998" "2068" "1969"

The issue is it does not work with dates older than 1969.

Upvotes: 2

thelatemail
thelatemail

Reputation: 93813

You could get a bit creative and specify an ugly format for the missing data, and then just keep one of the valid responses:

vals <- c("12/31/1998", "12/--/98")
out <- pmax(
         as.Date(vals, "%m/%d/%Y"),
         as.Date(paste0("01",vals), "%d%m/--/%y"),
         na.rm=TRUE
       )
format(out, "%Y")
#[1] "1998" "1998"

Upvotes: 3

Related Questions