TooTone
TooTone

Reputation: 8126

Clarification of copying array semantics in R on assignment to array

Here is some code exploring the additional copying that can result from assigning to a cell in an array (in this case using a for loop).

# populate a vector with a million random numbers
n = 10^6
v=runif(n)
# vectorized version: fast
vv<-v*v;
m<-mean(vv); m
# for loop: slow
tracemem(vv)
for(i in 1:length(v)) { vv[i]<-v[i]*v[i] };
m<-mean(vv); m

outputs

> vv<-v*v;
> m<-mean(vv); m
[1] 0.3329162
> # for loop: slow
> tracemem(vv)
[1] "<0x000007ffff560010"
> for(i in 1:length(v)) { vv[i]<-v[i]*v[i] };
tracemem[0x000007ffff560010 -> 0x000007fffe570010]: 
> m<-mean(vv); m
[1] 0.3329162

which seems to indicate that there is a copy of the vector on the very first iteration of the loop.

Note: this is a follow-up to my earlier question Why is vectorization faster, this answer to it, and this comment on the answer.

Just to confirm the copying, I did the first iteration outside of the loop body

v=runif(n)
# vectorized version: fast
vv<-v*v;
m<-mean(vv); m
# for loop: slow
tracemem(vv)
vv[1]<-v[1]*v[1]
tracemem(vv)
for(i in 2:length(v)) { vv[i]<-v[i]*v[i] };
m<-mean(vv); m

gives this output

> vv<-v*v;
> m<-mean(vv); m
[1] 0.33385
> # for loop: slow
> tracemem(vv)
[1] "<0x000007fffef80010"
> vv[1]<-v[1]*v[1]
tracemem[0x000007fffef80010 -> 0x000007fffddc0010]: 
> tracemem(vv)
[1] "<0x000007fffddc0010"
> for(i in 2:length(v)) { vv[i]<-v[i]*v[i] };
> m<-mean(vv); m
[1] 0.33385 # (different as I generated the random nos again)

After reading joran's answer and this nabble discussion thread, I started to get familiar with the idea of R potentially copying vectors, e.g. when you change the type as below

> x = 1:10
> tracemem(x)
[1] "<0x00000000118ba4e0"
> x[5] = 6
tracemem[0x00000000118ba4e0 -> 0x0000000010d03568]: 
> x = 1:10 # starts off as integer
> tracemem(x)
[1] "<0x00000000118ba538"
> x[5] = 6L # setting integer ok
> x[5] = 6 # setting floating point changes type
tracemem[0x00000000118ba538 -> 0x0000000010d03568]: 
> x[6] = 7 # it's now floating point, setting floating point again ok
> x[7] = "asdf" # setting string changes type once more, this tanks on a large array
tracemem[0x0000000010d03568 -> 0x0000000010d03610]: 

So I have a rough idea of what's going on, but why in my first example is there a copy of vv (or what mistake have I made in interpretation), when vv is already an array of floating points?

Upvotes: 2

Views: 504

Answers (1)

Matthew Lundberg
Matthew Lundberg

Reputation: 42639

A copy is made because R thinks that there may be another reference to the object:

x <- 1:10
.Internal(inspect(x))
## @5a27838 13 INTSXP g0c4 [NAM(1)] (len=10, tl=0) 1,2,3,4,5,...
# NAM(1) means that there is one reference to the object.

tracemem(x)
## [1] "<0x05a27838>"
.Internal(inspect(x))
## @5a27838 13 INTSXP g0c4 [NAM(1),TR] (len=10, tl=0) 1,2,3,4,5,...
# Still one reference

mean(x)
## [1] 5.5
.Internal(inspect(x))
## @5a27838 13 INTSXP g0c4 [NAM(2),TR] (len=10, tl=0) 1,2,3,4,5,...
# NAM(2) means "more than one" reference.
# A copy of the "pointer" was taken to pass to "mean", which bumped the count.
# The count starts at (essentially) 1, and is set to 2 if a copy is made.  Never back to 1 though.

x[1] <- 0
tracemem[0x05a27838 -> 0x05a278c8]: 
tracemem[0x05a278c8 -> 0x05a0d6f0]: 

An assignment doesn't actually copy data (until a modification is made). Rather, it makes a copy of the pointer and indicates that none are singletons:

x <- 1
y <- x
.Internal(inspect(x))
## @5a61848 14 REALSXP g0c1 [NAM(2)] (len=1, tl=0) 1
.Internal(inspect(y))
## @5a61848 14 REALSXP g0c1 [NAM(2)] (len=1, tl=0) 1
y[1] <- 1
.Internal(inspect(y))
## @5a61948 14 REALSXP g0c1 [NAM(1)] (len=1, tl=0) 1
# Note, a new memory address, and NAM(1).

Upvotes: 5

Related Questions