Andarin
Andarin

Reputation: 829

Matrix doubles memory usage when replacing values

Why does my matrix doubles in size if I replace values in it? Can I prevent R from doing so? Example:

set.seed(42)
a <- matrix(rbinom(10000,2,0.45),ncol=10)
object.size(a)/1024/1024
# 0.038 Mb
# I want to have a mean smaller than 1 in every column
# Thus, swap 0's and 2' in every column where mean is bigger than 1
swapcol <- colMeans(a)>1
swapmat <- a[,swapcol]
tracemem(a)
# [1] "<0x7fe9d2f16f50>"
a[,swapcol][swapmat==2] <- 0
# tracemem[0x7fe9d2f16f50 -> 0x7fe9c2d98b90]: 
# tracemem[0x7fe9c2d98b90 -> 0x7fe9c2d2bf70]: 
a[,swapcol][swapmat==0] <- 2
# tracemem[0x7fe9c2d2bf70 -> 0x7fe9c2e1b460]: 
object.size(a)/1024/1024
# 0.076 Mb, memory occupation doubled

I understand that the matrix maybe gets copied in order to replace the value, but why does it get bigger? (replace() results in the same behaviour) I read the chapter of Hadley's book about Memory usage and the R Documentation to this question but I am still wondering why this is happening. I thought maybe R demands a bit more space from the OS in case I want to enlarge the matrix, but why twice the space? This is even true (with the same factor) for big matrices, making my system swapping memory (thus contradicting a potential time saving effect).

Thanks for any hints!

Upvotes: 1

Views: 161

Answers (2)

joran
joran

Reputation: 173697

Converting comment to answer:

0 and 2 are floats (i.e. doubles). Your matrix contains integers. Use 0L and 2L to force R to treat them as integers:

set.seed(42)
> a <- matrix(rbinom(10000,2,0.45),ncol=10)
> object.size(a)/1024/1024
0.0383377075195312 bytes

> swapcol <- colMeans(a)>1
> swapmat <- a[,swapcol]
> tracemem(a)
[1] "<0x7fc50ec45e00>"
> a[,swapcol][swapmat==2] <- 0L
tracemem[0x7fc50ec45e00 -> 0x7fc50d839e00]: 

> a[,swapcol][swapmat==0] <- 2L
> object.size(a)/1024/1024
0.0383377075195312 bytes

Same size!

Upvotes: 3

Gavin Simpson
Gavin Simpson

Reputation: 174928

The problem is that 0, 2 etc are not integers but doubles as far as R is concerned and when you assign them to the matrix a's elements you force R to store the modified a using doubles which increases the object's memory size. The original a was stored using integers, which take up less memory each. You can see this via storage.mode():

set.seed(42)
a <- matrix(rbinom(10000,2,0.45),ncol=10)

> storage.mode(a)
[1] "integer"

swapcol <- colMeans(a)>1
swapmat <- a[,swapcol]
a[,swapcol][swapmat==2] <- 0
a[,swapcol][swapmat==0] <- 2

> storage.mode(a)
[1] "double"
> format(object.size(a), units = "Kb")
[1] "78.3 Kb"

To fix this, append L to the values you assign to a; this is R's notation for an integer.

set.seed(42)
a <- matrix(rbinom(10000,2,0.45),ncol=10)
swapcol <- colMeans(a)>1
swapmat <- a[,swapcol]
a[,swapcol][swapmat==2] <- 0L
a[,swapcol][swapmat==0] <- 2L

> storage.mode(a)
[1] "integer"
> format(object.size(a), units = "Kb")
[1] "39.3 Kb"

Upvotes: 4

Related Questions