Change data.frame in *_ply function

Question

Let's say I have

a <- data.frame( z = rep( c("A", "B", "C"), 2 ), p = 1:6, stringsAsFactors=FALSE )
b <- data.frame( z = c( rep( "A", 5), rep( "B", 5 ) ), q = 1:10, stringsAsFactors=FALSE  )

and want to manipulate a while iterating over b using plyr functions, for example

library(plyr)
d_ply( b, "z", function( x ){
  a[ a$z == x[1, "z"], "p" ] <<- a[ a$z == x[1, "z"], "p" ] + sum(x$q)
})

In this case I have to use <<- for the assignment in order to change a outside the d_ply. If I use only <- a won't change. What I definitely want to avoid is iterating over a since b$z contains only a very small subset of a$z.

So my questions are:

Is there a simple and performant solution using plyr which avoids the <<-?
Is there another handy solution (maybe except for( i in unique(b$z) ){ ... })?
If I stick with my solution, are there any implications using the <<- in this way? Can I be sure that under any circumstance only the closest a (in terms of environments) to the d_ply call will be manipulated? Especially since this is all part of a ReferenceClass method.

Ricardo Saporta · Accepted Answer

here is an option using data.table instead of plyr

library(data.table)
a <- data.table(a, key="z")
b <- data.table(b, key="z")

a[b[, sum(q),  by=z], p := p + V1]

   z  p
1: A 16
2: A 19
3: B 42
4: B 45
5: C  3
6: C  6

Edit:

Regarding your third question and using <<-, I would advise against it. If you want to assign into a different environment, use the assign(., envir=.) function which allows you to specify which environment to assign into.

Change data.frame in *_ply function

Answers (1)

Edit:

Related Questions