Reputation: 6669

Can one get the grouping variables inside empty grouping in plyr ddply?

Suppose we have data like this:

library(plyr)

#some data
x = data.frame(
  letters = factor(c("a", "c"), levels = letters[1:4])
)

I.e., we have levels b and d of a factor that doesn't appear in the data. We can loop over the groups of letters:

#loop inside
plyr::ddply(x, "letters", function(xx) {
  #do something here
  if (xx$letters == "b") print("do something")

  data.frame(
    count = nrow(xx)
  )
})

gives us:

  letters count
1       a     1
2       c     1

So we are missing the b and d levels. We then add drop = F to not skip them:

plyr::ddply(x, "letters", .drop = F, function(xx) {
  #do something here
  #if (xx$letters == "b") print("do something")

  data.frame(
    count = nrow(xx)
  )
})

we get:

  letters count
1       a     1
2       b     0
3       c     1
4       d     0

However, suppose we want to do something inside the loop based on the letter group. We want to do something when we get the empty b group. However, we don't know when we are inside it. If we add if (nrow(xx)==0) browser(), we can look at xx object:

[1] letters
<0 rows> (or 0-length row.names)

But we can't tell whether it is b or d. Is it possible to find out?

Upvotes: 0

Answers (1)

CoderGuy123

Reputation: 6669

Yes, it can be done with fancy lookup. To figure it out, call browser() inside the loop and inspect the environment for objects with ls():

Called from: .fun(piece, ...)
Browse[1]> c
Called from: .fun(piece, ...)
Browse[1]> xx
[1] letters
<0 rows> (or 0-length row.names)
Browse[1]> ls(all.names = T)
[1] "xx"

So there is nothing here except for the empty data frame piece (subset of original data). It would have been nice if there was a hidden object here to indicate the piece but alas. However, we can look at the parent environments and see if we get lucky:

Browse[1]> ls(all.names = T, envir = parent.frame(1))
[1] "i"     "piece"
Browse[1]> ls(all.names = T, envir = parent.frame(2))
 [1] "..."       ".data"     ".fun"      ".inform"   ".parallel" ".paropts"  ".progress" "do.ply"    "n"         "pieces"    "progress" 
[12] "result"

OK, there is definitely something in them. One can fetch these using get() or mget() for multiple at a time:

Browse[1]> mget(ls(envir = parent.frame(1)), envir = parent.frame(1))
$i
[1] 2

$piece
[1] letters
<0 rows> (or 0-length row.names)

Browse[1]> mget(ls(envir = parent.frame(2)), envir = parent.frame(2))
$do.ply
function (i) 
{
    piece <- pieces[[i]]
    if (.inform) {
        res <- try(.fun(piece, ...))
        if (inherits(res, "try-error")) {
            piece <- paste(utils::capture.output(print(piece)), 
                collapse = "\n")
            stop("with piece ", i, ": \n", piece, call. = FALSE)
        }
    }
    else {
        res <- .fun(piece, ...)
    }
    progress$step()
    res
}
<bytecode: 0x559669467ca8>
<environment: 0x55966c7c6798>

$n
[1] 4

$pieces
$a
  letters
1       a

$b
[1] letters
<0 rows> (or 0-length row.names)

$c
  letters
1       c

$d
[1] letters
<0 rows> (or 0-length row.names)


$progress
$progress$init
function (x) 
NULL
<bytecode: 0x559669453cd0>
<environment: 0x55966e5c8b50>

$progress$step
function () 
NULL
<bytecode: 0x559669453e58>
<environment: 0x55966e5c8b50>

$progress$term
function () 
NULL
<bytecode: 0x559669453e58>
<environment: 0x55966e5c8b50>


$result
$result[[1]]
NULL

$result[[2]]
NULL

$result[[3]]
NULL

$result[[4]]
NULL

So we see that i in parent.frame(1) is the current subset count, and the names on pieces in parent.frame(2) has the levels we want. Putting them together, we can get the current level:

plyr::ddply(x, "letters", .drop = F, function(xx) {
  #figure out the piece
  i = get("i", envir = parent.frame(1))
  levels = names(get("pieces", envir = parent.frame(1)))
  current_piece = levels[i]

  #do something
  if (current_piece == "b") print("this is the b empty group!") else print("This is not level b")

  data.frame(
    count = nrow(xx)
  )
})

which results in:

[1] "This is not level b"
[1] "this is the b empty group!"
[1] "This is not level b"
[1] "This is not level b"
  letters count
1       a     1
2       b     0
3       c     1
4       d     0

Upvotes: 0

Can one get the grouping variables inside empty grouping in plyr ddply?

Answers (1)

Related Questions