Reputation: 3919
I am using fread()
from data.table
to efficiently read large rectangular CSV files into R
which are all double
(and only double
) values -- no missing elements.
However if I have very very small numbers in scientific notation, it'll get converted to character which ruins the whole read. Here is the error message (as an example, there are multiple for each small number):
16: In fread("SomeCSVFile") :
Bumped column 560 to type character on data row 16799, field contains '-2.1412168512924677E-308'. Coercing previously read values in this column from integer or numeric back to character which may not be lossless; e.g., if '00' and '000' occurred before they will now be just '0', and there may be inconsistencies with treatment of ',,' and ',NA,' too (if they occurred in this column before the bump). If this matters please rerun and set 'colClasses' to 'character' for this column. Please note that column type detection uses the first 5 rows, the middle 5 rows and the last 5 rows, so hopefully this message should be very rare. If reporting to datatable-help, please rerun and include the output from verbose=TRUE.
I want the function to set them to zero or truncate them at the minimum possible value (either is fine).
Upvotes: 0
Views: 3433
Reputation: 121177
To reproduce this, I put this content in a text file:
x
1
1
1
1
1
1e-309
Then I called fread("that file.txt")
.
The size of the smallest positive number that R can store is
format(.Machine$double.xmin, digits = 22)
## [1] "2.2250738585072013828e-308"
Your data file includes the value -2.1412168512924677E-308
, which is smaller than this limit. To prevent R treating the value as zero, the data.table
package has converted the column to be strings. This stops the data precision being lost.
If you need to work with values of this size, then use the Rmpfr
package to store the numbers with more precision. Import them as characters (using colClasses
; see that data table warning text). Then use
library(Rmpfr)
mpfr("-2.1412168512924677E-308")
## 1 'mpfr' number of precision 70 bits
## [1] -2.1412168512924676999992e-308
As Ben Bolker siad in the comments. If you don't care about the tiny numbers, and just want to treat them as zero, then import the column as characters, then use as.numeric
.
the_data <- fread("the file.txt", colClasses = "character")
the_data$DodgyColumn <- as.numeric(the_data$DodgyColumn)
Upvotes: 7