Reputation: 829
Why does my matrix doubles in size if I replace values in it? Can I prevent R from doing so? Example:
set.seed(42)
a <- matrix(rbinom(10000,2,0.45),ncol=10)
object.size(a)/1024/1024
# 0.038 Mb
# I want to have a mean smaller than 1 in every column
# Thus, swap 0's and 2' in every column where mean is bigger than 1
swapcol <- colMeans(a)>1
swapmat <- a[,swapcol]
tracemem(a)
# [1] "<0x7fe9d2f16f50>"
a[,swapcol][swapmat==2] <- 0
# tracemem[0x7fe9d2f16f50 -> 0x7fe9c2d98b90]:
# tracemem[0x7fe9c2d98b90 -> 0x7fe9c2d2bf70]:
a[,swapcol][swapmat==0] <- 2
# tracemem[0x7fe9c2d2bf70 -> 0x7fe9c2e1b460]:
object.size(a)/1024/1024
# 0.076 Mb, memory occupation doubled
I understand that the matrix maybe gets copied in order to replace the value, but why does it get bigger? (replace() results in the same behaviour) I read the chapter of Hadley's book about Memory usage and the R Documentation to this question but I am still wondering why this is happening. I thought maybe R demands a bit more space from the OS in case I want to enlarge the matrix, but why twice the space? This is even true (with the same factor) for big matrices, making my system swapping memory (thus contradicting a potential time saving effect).
Thanks for any hints!
Upvotes: 1
Views: 161
Reputation: 173697
Converting comment to answer:
0
and 2
are floats (i.e. doubles). Your matrix contains integers. Use 0L
and 2L
to force R to treat them as integers:
set.seed(42)
> a <- matrix(rbinom(10000,2,0.45),ncol=10)
> object.size(a)/1024/1024
0.0383377075195312 bytes
> swapcol <- colMeans(a)>1
> swapmat <- a[,swapcol]
> tracemem(a)
[1] "<0x7fc50ec45e00>"
> a[,swapcol][swapmat==2] <- 0L
tracemem[0x7fc50ec45e00 -> 0x7fc50d839e00]:
> a[,swapcol][swapmat==0] <- 2L
> object.size(a)/1024/1024
0.0383377075195312 bytes
Same size!
Upvotes: 3
Reputation: 174928
The problem is that 0
, 2
etc are not integers but doubles as far as R is concerned and when you assign them to the matrix a
's elements you force R to store the modified a
using doubles which increases the object's memory size. The original a
was stored using integers, which take up less memory each. You can see this via storage.mode()
:
set.seed(42)
a <- matrix(rbinom(10000,2,0.45),ncol=10)
> storage.mode(a)
[1] "integer"
swapcol <- colMeans(a)>1
swapmat <- a[,swapcol]
a[,swapcol][swapmat==2] <- 0
a[,swapcol][swapmat==0] <- 2
> storage.mode(a)
[1] "double"
> format(object.size(a), units = "Kb")
[1] "78.3 Kb"
To fix this, append L
to the values you assign to a
; this is R's notation for an integer.
set.seed(42)
a <- matrix(rbinom(10000,2,0.45),ncol=10)
swapcol <- colMeans(a)>1
swapmat <- a[,swapcol]
a[,swapcol][swapmat==2] <- 0L
a[,swapcol][swapmat==0] <- 2L
> storage.mode(a)
[1] "integer"
> format(object.size(a), units = "Kb")
[1] "39.3 Kb"
Upvotes: 4