ADF
ADF

Reputation: 572

Why can converting numbers to characters change the numbers?

I imagine this has to do with R's data structures and the answer will be quick, but I haven't yet found one so here goes:

as.character(9875987598759875)
[1] "9875987598759876"

library(crayon)
chr(9875987598759875)
[1] "9875987598759876"

toString(9875987598759875)
[1] "9875987598759876"

What gives? How should I be making this conversion more safely?

Upvotes: 1

Views: 272

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226087

.Machine$integer.max indicates that the largest integer R can store is 2147483647 (this could conceivably vary across platforms, but it's very unlikely to). Any number larger than that is automatically converted to floating point, with the attendant imprecision/round-off error. (Unlike in Python, which expensively but magically converts integer variables to an arbitrary-length representation as necessary.)

If you install the bit64 package you can use 64-bit integers, with (presumably) exactness up to

print(2^63-1,digits=22)
[1] 9223372036854775808

If you start with a character string, you can safely do round-trip conversion to integer64 and back:

library(bit64)
cc <- "9875987598759875"
x <- as.integer64(cc)
identical(cc,as.character(x))
## [1] TRUE

However, typically once you've read a number into R as a regular number it's too late. You can use colClasses="integer64" with read.table()/read.csv()/etc. to read values in as integer64; I believe the file-reading functions from readr and data.table also have integer64-handling capabilities.

For many applications, if you're not actually planning on doing anything numerical with these digit-strings, it's safest and easiest to make sure you import them as character in the first place ...

Upvotes: 3

Related Questions