Reputation: 839
I am running out of RAM in R with a data.table that contains ~100M rows and 40 columns full of doubles. My naive thought was that I could reduce the object size of the data table by reducing the precision. There is no need for 15 digits after the comma. I played around by rounding, but as we know
round(1.68789451154844878,3)
gives
1.6879999999999999
and does not help. Therefore, I transformed the values to integers. However, as the small examples below show for a numeric vector, there is only a 50% reduction from 8000040 bytes to 4000040 bytes and this reduction does not increase any more when reducing the precision further.
Is there a better way to do that?
set.seed(1)
options(digits=22)
a1 = rnorm(10^6)
a2 = as.integer(1000000*(a1))
a3 = as.integer(100000*(a1))
a4 = as.integer(10000*(a1))
a5 = as.integer(1000*(a1))
head(a1)
head(a2)
head(a3)
head(a4)
head(a5)
give
[1] -0.62645381074233242 0.18364332422208224 -0.83562861241004716 1.59528080213779155 0.32950777181536051 -0.82046838411801526
[1] -626453 183643 -835628 1595280 329507 -820468
[1] -62645 18364 -83562 159528 32950 -82046
[1] -6264 1836 -8356 15952 3295 -8204
[1] -626 183 -835 1595 329 -820
and
object.size(a1)
object.size(a2)
object.size(a3)
object.size(a4)
object.size(a5)
give
8000040 bytes
4000040 bytes
4000040 bytes
4000040 bytes
4000040 bytes
Upvotes: 6
Views: 530
Reputation: 1719
Not as such, no. In R, an integer takes 4 bytes and a double takes 8. If you are allocating space for 1M integers you perforce are going to need 4M bytes of RAM for the vector of results.
Upvotes: 1