Argent
Argent

Reputation: 945

R: Check if a date is valid

Assume a date specified as three integers: year, month, day

The year is a 4 digit integer (such as 2020), the month ranges over 1-12, the day over 1-31.

I'm looking for a simple function (call it checkdate) that can check whether a date is valid, returning TRUE if valid and FALSE if not valid.

For example, checkdate(2008, 2, 29) would return TRUE because 2008 was a leap year.

On the other hand checkdate(2009, 2, 29) would return FALSE because 2009 was not a leap year.

checkdate(2009, 6, 31) would return FALSE because June has only 30 days.

Etc.

UPDATE

Based on Dirk's answer, below, here is a function that does what I asked:

    checkdate = function(y, m, d) {
        #y: A year, not abbreviated to 2 digits.
        #m: An integer in the range 1-12.
        #d: An integer in the range 1-31.

        #Convert to an R Date object.
        #If the date is not valid, NA is returned.
        dt = as.Date(paste(y, m, d, sep='-'), optional=TRUE)

        ifelse(is.na(dt), FALSE, TRUE)
    }

Upvotes: 0

Views: 1354

Answers (4)

mayeulk
mayeulk

Reputation: 150

Here is a safer function:

checkdate <- function(y, m, d, min.year = NA, max.year = NA, recycle = TRUE) {
  if (!recycle){
    y_length <- length(y)
    m_length <- length(m)
    d_length <- length(d)
    if (y_length != m_length | d_length != m_length ){
      stop("The y, m and d vectors provided do not have the same length.")
    }
  }
  
  dates <- paste(y, m, d, sep = "-")
  # Accepts numbers and characters, but explicitly check conversion
  
  !is.na(as.numeric(y)) &
  
  # These 2 lines of code are optional but useful (with min.year=1900, reject "23" as we think "2023" is meant)
    (is.na(min.year) | as.numeric(y) >= min.year ) &  
    (is.na(max.year) | as.numeric(y) <= max.year ) & 
  !is.na(as.numeric(m)) & as.numeric(m) > 0 & as.numeric(m) < 13 &
  !is.na(as.numeric(d)) & as.numeric(d) > 0 & as.numeric(d) < 32 &
  !is.na(as.Date(dates,
                 format = "%Y-%m-%d",
                 optional = TRUE # indicating to return NA (instead of signalling an error)
                 )
         )
}

# Possible uses:
checkdate(0:40, 0:40, 0:40)
checkdate(0:40, 0:40, 0:40, min.year = 2000)
checkdate("2023", 0:40, 0:40, min.year = 2000)
checkdate("2023", 0:40, 0:40, recycle = F)

This is much safer than the other answers. It will work with 3 vectors y, m, d (contrary to Ronak's answer); it recycles them if needed (we can check vector lengths with recycle=F to prevent this). It accepts strings and numbers. It will not accept "10-11-12" or "10-11-123456789". Note that, a bit surprisingly:

as.Date (paste(10, 11, 2023, sep = '-'), format = "%Y-%m-%d" )
[1]  "10-11-20" # is "a valid date" (!)

But this is not too surprising: as.Date() was not designed to be a validation function but a function converting valid inputs. We need to be more careful for validation.

The min.year, max.year options and related lines of code are optional but are, in my view, useful in some contexts; they define a valid range for the year. With min.year=1900, we reject "23" as we think "2023" was meant.

Upvotes: 0

hello_friend
hello_friend

Reputation: 5798

Base R using @Ronak Shah's logic:

checkdate <- function(y, m, d) {
  tryCatch(inherits(as.Date(paste(y, m, d, sep = '-')), "Date"), 
           error = function(e) return(FALSE))
}
checkdate(2015, 12, 31)

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389175

Try to convert the inputs to date if it fails return FALSE.

checkdate <- function(y, m, d) {
    tryCatch(lubridate::is.Date(as.Date(paste(y, m, d, sep = '-'))), 
             error = function(e) return(FALSE))
}

checkdate(2009, 6, 31)
#[1] FALSE
checkdate(2009, 2, 29)
#[1] FALSE
checkdate(2008, 2, 29)
#[1] TRUE

Upvotes: 3

Dirk is no longer here
Dirk is no longer here

Reputation: 368439

Sure. Just try to parse it:

R> days <- 28:31
R> dates <- paste0("2020-02-", days)
R> as.Date(dates)
[1] "2020-02-28" "2020-02-29" NA           NA          
R> 

This shows that in 2020, Feb 28 and 29 existed (leap year) but not 30 and 31.

From you three vectors you could use sprintf("%4d-%02d-%02", y, m, d) to create a vector of text inputs to parse.

Upvotes: 0

Related Questions