Reputation: 1854
Since 1.13, fread can automatically detect and read date types, which is really nice. I would like to detect dates if, and only if, they are in a specific format, but from what I have seen, fread only detects ISO-formatted dates with no option to change this.
> packageVersion('data.table')
[1] '1.14.0'
> typeof(data.table::fread("date\n2020-10-10")[[1]])
[1] "integer"
> typeof(data.table::fread("date\n10-10-2020")[[1]])
[1] "character"
> typeof(data.table::fread("date\n10-10-2020", format = "%d-%m-%y")[[1]])
Error in data.table::fread("date\n10-10-2020", format = "%d-%m-%y") :
argument inutilisé (format = "%d-%m-%y")
Is there a way to change the format of dates detected by fread or is it going to happen in the future? If not, could the option datatable.old.fread.datetime.character
be made permanent to prevent that detection when it isn't wanted?
Upvotes: 4
Views: 1060
Reputation: 573
Adding as an answer, adapting from my own GitHub comment here:
If you create a custom class and pre-specify the list of possible datetime formats you expect, you can use fread
to read these in directly into an IDate
data type (or ITime
or POSIXct
) without having to do a post-conversion from character
class. In the OP's use-case they can do the following:
Code
library('lubridate')
library('data.table')
### Set up options for data.table's fread, including an "invented" one for the new class we will define ###
# Alternate date formats HAVE to be set prior to the fread call.
# no dateFormat parameter is possible in the fread call itself.
options(datatable.fread.dateFormat=c("Ymd","dmY"))
### Create a custom Class with `as` creation function ###
setClass("MorphDate", prototype=NA_integer_)
as.MorphDate.character <- function(from) {
DateObj <- lubridate::parse_date_time(from, orders=getOption("datatable.fread.dateFormat","Ymd T"), truncated=1)
doi <- as.integer(DateObj)
if (any((doi %% 86400L) > 0L, na.rm = TRUE)) {
attr(DateObj,"class") <- c("MorphDate","POSIXct","POSIXt") # could use IDateTime class, but that class uses a two column data.table of IDate and ITime and would require an explicit call to a conversion function.
} else if (all(doi < 86400L, na.rm = TRUE)) {
DateObj <- as.ITime(doi) # I don't know if the call to ITime is strictly required. setting the class may be enough.
attr(DateObj,"class") <- c("MorphDate","ITime")
} else {
DateObj <- as.Date(doi %/% 86400L, origin='1970-01-01') # or as.IDate if that is available.
attr(DateObj,"class") <- c("MorphDate","IDate","Date")
}
DateObj
}
### Define conversion functions to/from character string
MorphDate <- as.MorphDate.character
as.character.MorphDate <- function(from) format(from)
setAs("character","MorphDate",as.MorphDate.character)
setAs("MorphDate","character",as.character.MorphDate)
print.MorphDate <- function(x, ...) print(format(x, ...))
setOldClass("MorphDate",S4Class="MorphDate")
### END Class Def ###
### read in string via fread, using new "MorphDate" class
# read in file using `MorphDate` class (defined above)
fread("date\n10-10-2020",
colClasses=list(MorphDate=c("date")))
disclaimers:
lubridate
package, so if looking for a pure data.table
solution, you would have to modify the appropriate format references and function call(s) to use POSIX conventions.Upvotes: 1