Julien Lopez
Julien Lopez

Reputation: 1854

Is there a way to specify the format of dates with fread?

Since 1.13, fread can automatically detect and read date types, which is really nice. I would like to detect dates if, and only if, they are in a specific format, but from what I have seen, fread only detects ISO-formatted dates with no option to change this.

> packageVersion('data.table')
[1] '1.14.0'    
> typeof(data.table::fread("date\n2020-10-10")[[1]])
[1] "integer"
> typeof(data.table::fread("date\n10-10-2020")[[1]])
[1] "character"
> typeof(data.table::fread("date\n10-10-2020", format = "%d-%m-%y")[[1]])
Error in data.table::fread("date\n10-10-2020", format = "%d-%m-%y") :
  argument inutilisé (format = "%d-%m-%y")

Is there a way to change the format of dates detected by fread or is it going to happen in the future? If not, could the option datatable.old.fread.datetime.character be made permanent to prevent that detection when it isn't wanted?

Upvotes: 4

Views: 1060

Answers (1)

mpag
mpag

Reputation: 573

Adding as an answer, adapting from my own GitHub comment here:

If you create a custom class and pre-specify the list of possible datetime formats you expect, you can use fread to read these in directly into an IDate data type (or ITime or POSIXct) without having to do a post-conversion from character class. In the OP's use-case they can do the following:

Code

library('lubridate')
library('data.table')

### Set up options for data.table's fread, including an "invented" one for the new class we will define ###
# Alternate date formats HAVE to be set prior to the fread call.
# no dateFormat parameter is possible in the fread call itself.

options(datatable.fread.dateFormat=c("Ymd","dmY"))

### Create a custom Class with `as` creation function ###
setClass("MorphDate", prototype=NA_integer_)
as.MorphDate.character <- function(from) {
  DateObj <- lubridate::parse_date_time(from, orders=getOption("datatable.fread.dateFormat","Ymd T"), truncated=1)
  doi <- as.integer(DateObj)
  if (any((doi %% 86400L) > 0L, na.rm = TRUE)) {
    attr(DateObj,"class") <- c("MorphDate","POSIXct","POSIXt")     # could use IDateTime class, but that class uses a two column data.table of IDate and ITime and would require an explicit call to a conversion function.
  } else if (all(doi < 86400L,  na.rm = TRUE)) {
    DateObj <- as.ITime(doi)                                       # I don't know if the call to ITime is strictly required.  setting the class may be enough.
    attr(DateObj,"class") <- c("MorphDate","ITime")
  } else {
    DateObj <- as.Date(doi %/% 86400L, origin='1970-01-01')  # or as.IDate if that is available.
    attr(DateObj,"class") <- c("MorphDate","IDate","Date")
  }
  DateObj
}

### Define conversion functions to/from character string
MorphDate <- as.MorphDate.character
as.character.MorphDate <- function(from) format(from)
setAs("character","MorphDate",as.MorphDate.character)
setAs("MorphDate","character",as.character.MorphDate)
print.MorphDate <- function(x, ...) print(format(x, ...))
setOldClass("MorphDate",S4Class="MorphDate")
### END Class Def ###

### read in string via fread, using new "MorphDate" class

# read in file using `MorphDate` class (defined above)
fread("date\n10-10-2020",
    colClasses=list(MorphDate=c("date")))

disclaimers:

  • Not as fast as using IDate or C-compiled code for "ISO8601" or other already-recognized datetime format directly
  • Uses lubridate package, so if looking for a pure data.table solution, you would have to modify the appropriate format references and function call(s) to use POSIX conventions.

Upvotes: 1

Related Questions