Reputation: 45
When I do this code:
library(lubridate)
df$date <- format(as.Date(df$date, "%m/%d/%y") , "%Y")
Some of the dates that are meant to be in the 1900s, eg: 1960, turn to 2060. I'm not sure how to fix this. The date range I want is 1951 - 2014, and I have around 8000 observations.
Upvotes: 3
Views: 400
Reputation: 886928
We can also do this with chron
as the cut-off date is 1961 by default
as.Date(chron::dates(c('01/12/60', '01/12/78' ,'01/01/91', '01/01/54')))
#[1] "1960-01-12" "1978-01-12" "1991-01-01" "1954-01-01"
Upvotes: 0
Reputation: 388797
It seems that you have 2-digit years. From ?strptime
Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’.
So all 2 digit years from 00-68 are prefixed with 20, hence 60 turns to 2060 and not 1960.
There could be various ways to handle this. One way would be to subtract 100 years from dates whose year is more than 2014 (since we know the range of years).
For example,
df <- data.frame(date = c('1/12/60', '1/12/78' ,'1/1/91', '1/1/54'))
df$date <- as.Date(df$date, "%m/%d/%y")
df
# date
#1 2060-01-12
#2 1978-01-12
#3 1991-01-01
#4 2054-01-01
inds <- as.numeric(format(df$date, "%Y")) > 2014
df$date[inds] <- df$date[inds] - lubridate::years(100)
df
# date
#1 1960-01-12
#2 1978-01-12
#3 1991-01-01
#4 1954-01-01
Upvotes: 3