Reputation: 41
I initially wrote a script that does calculation running through ~70k iterations, I was using rbind to 'stitch' the outcomes together (1 iteration can result in outcome with 0 to many rows, so I don't think pre-allocating the output makes sense). To speed things up I've split this into 4 separate scripts that each handle 25% of the iterations in separate sessions and write their solutions (each between 150k-400k rows) to csv, which are all read back into a single script to bind the solutions together
I'm having a problem with one of the columns though - it contains a date, in the csv they're stored as "dd-mm-yy" ... Scripts 1, 2 & 4 read in as anticipated - they're stored as 'character' type which is fine by me. However script 3 reads in the date column as an IDate and adds "00" at the front of the string
The rbind doesn't like having different data types, I can 'make it work' by including colClasses = c(DATE = "character") in the fread for file 3, however I'd much rather understand WHY it is occurring and assume I can probably adjust something at the fwrite stage?
Upvotes: 3
Views: 1810
Reputation: 482
For now, you can do this with:
options(datatable.old.fread.datetime.character=T)
In the NEWS file for the data.table package under "data.table v1.14.0 (21 Feb 2021)" it says "As before, the migration option datatable.old.fread.datetime.character can still be set to TRUE to revert to the old character behavior. This migration option is temporary and will be removed in the near future."
But as of version 1.14.8 (the current latest non-dev version) this option still works.
Reading dates as character will make data.table::fread() slower and make memory usage higher, but if you really need to do it there is a way.
Upvotes: 0
Reputation: 41
It appears the issue lay in the date format of the original data, everything worked by passing the data through as.Date before writing to the csv files
Thanks to @thelatemail for pointing out: "recent versions of fread automatically guess date formats - https://www.rdocumentation.org/packages/data.table/versions/1.14.0/topics/fread - "bit64::integer64, IDate, and POSIXct types are also detected and read directly without needing to read as character before converting."
I'm still not 100% sure why only 1 of 4 was read as date while the others as character, but my hypothesis is that the "dd-mm-yy" format is ambiguous for the fread to interpret so should be avoided if possible
Upvotes: 1