Reputation: 2454
I have column say x
which has two different date formats 12/31/1998
and 12/--/98
. As you can see, in the second format date
is missing and year
is in 2 digits.
I need to extract year
from all the dates in my column. So, when I am using Year<- data.frame(format(df$x, "%Y"))
it returning year
for first format. For second format, it is returning NA
.
I would appreciate all the help. Thanks.
Upvotes: 0
Views: 185
Reputation: 7774
If they are all in the format where the year is the last number after "/" you can use basename
. Then you just need to covert the 2 character years to a four year format:
vals <- c("12/31/1998", "12/--/98", "68", "69")
yrs <- basename(vals)
yrs <- ifelse(nchar(yrs) == 2, format(as.Date(yrs, format = "%y"), "%Y"), yrs)
yrs
# [1] "1998" "1998" "2068" "1969"
The issue is it does not work with dates older than 1969.
Upvotes: 2
Reputation: 93813
You could get a bit creative and specify an ugly format for the missing data, and then just keep one of the valid responses:
vals <- c("12/31/1998", "12/--/98")
out <- pmax(
as.Date(vals, "%m/%d/%Y"),
as.Date(paste0("01",vals), "%d%m/--/%y"),
na.rm=TRUE
)
format(out, "%Y")
#[1] "1998" "1998"
Upvotes: 3