user2763361
user2763361

Reputation: 3919

fread from data.table package can't read small numbers

I am using fread() from data.table to efficiently read large rectangular CSV files into R which are all double (and only double) values -- no missing elements.

However if I have very very small numbers in scientific notation, it'll get converted to character which ruins the whole read. Here is the error message (as an example, there are multiple for each small number):

16: In fread("SomeCSVFile") :
Bumped column 560 to type character on data row 16799, field contains '-2.1412168512924677E-308'. Coercing previously read values in this column from integer or numeric back to character which may not be lossless; e.g., if '00' and '000' occurred before they will now be just '0', and there may be inconsistencies with treatment of ',,' and ',NA,' too (if they occurred in this column before the bump). If this matters please rerun and set 'colClasses' to 'character' for this column. Please note that column type detection uses the first 5 rows, the middle 5 rows and the last 5 rows, so hopefully this message should be very rare. If reporting to datatable-help, please rerun and include the output from verbose=TRUE.

I want the function to set them to zero or truncate them at the minimum possible value (either is fine).

Upvotes: 0

Views: 3433

Answers (1)

Richie Cotton
Richie Cotton

Reputation: 121177

To reproduce this, I put this content in a text file:

x
1
1
1
1
1
1e-309

Then I called fread("that file.txt").


The size of the smallest positive number that R can store is

format(.Machine$double.xmin, digits = 22)
## [1] "2.2250738585072013828e-308"

Your data file includes the value -2.1412168512924677E-308, which is smaller than this limit. To prevent R treating the value as zero, the data.table package has converted the column to be strings. This stops the data precision being lost.

If you need to work with values of this size, then use the Rmpfr package to store the numbers with more precision. Import them as characters (using colClasses; see that data table warning text). Then use

library(Rmpfr)
mpfr("-2.1412168512924677E-308")
## 1 'mpfr' number of precision  70   bits 
## [1] -2.1412168512924676999992e-308

As Ben Bolker siad in the comments. If you don't care about the tiny numbers, and just want to treat them as zero, then import the column as characters, then use as.numeric.

the_data <- fread("the file.txt", colClasses = "character")
the_data$DodgyColumn <- as.numeric(the_data$DodgyColumn)

Upvotes: 7

Related Questions