Sofia
Sofia

Reputation: 45

Lubridate putting dates into the future in R

When I do this code:

library(lubridate)
df$date <- format(as.Date(df$date, "%m/%d/%y") , "%Y")

Some of the dates that are meant to be in the 1900s, eg: 1960, turn to 2060. I'm not sure how to fix this. The date range I want is 1951 - 2014, and I have around 8000 observations.

Upvotes: 3

Views: 400

Answers (2)

akrun
akrun

Reputation: 886928

We can also do this with chron as the cut-off date is 1961 by default

as.Date(chron::dates(c('01/12/60', '01/12/78' ,'01/01/91', '01/01/54')))
#[1] "1960-01-12" "1978-01-12" "1991-01-01" "1954-01-01"

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388797

It seems that you have 2-digit years. From ?strptime

Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’.

So all 2 digit years from 00-68 are prefixed with 20, hence 60 turns to 2060 and not 1960.

There could be various ways to handle this. One way would be to subtract 100 years from dates whose year is more than 2014 (since we know the range of years).

For example,

df <- data.frame(date = c('1/12/60', '1/12/78' ,'1/1/91', '1/1/54'))
df$date <- as.Date(df$date, "%m/%d/%y")
df
#        date
#1 2060-01-12
#2 1978-01-12
#3 1991-01-01
#4 2054-01-01

inds <- as.numeric(format(df$date, "%Y")) > 2014
df$date[inds] <- df$date[inds] - lubridate::years(100)
df
#        date
#1 1960-01-12
#2 1978-01-12
#3 1991-01-01
#4 1954-01-01

Upvotes: 3

Related Questions