Mislav
Mislav

Reputation: 1563

Unsupported use of matrix error using dplyr

Lets I have data frame like this:

    df <- structure(list(subjecttaxnoid = c("22661187010", "10346575807", 
"22439110996", "63510438612", "85267957976", "40178118040", "51246665873", 
"66803849969", "45813719599", "26979059418", "11240408751"), 
    reportyear = c(2014L, 2014L, 2014L, 2008L, 2008L, 2008L, 
    2008L, 2013L, 2013L, 2013L, 2013L), b001 = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), b002 = c(0, 3.43884233571018e-07, 7.24705810574303e-08, 
    1.41222784374111e-07, 1.62917712565032e-05, 0, 4.53310814208705e-07, 
    7.63856039195011e-06, 0, 0, 0)), .Names = c("subjecttaxnoid", 
"reportyear", "b001", "b002"), row.names = c(1L, 2L, 3L, 200000L, 
200001L, 200002L, 200003L, 40000L, 40001L, 40002L, 40003L), class = "data.frame")

and the vector that containt names of two columns of df:

x <- c("b001", "b002")

I would like to use components of x instead of columns names in dplyr:

my_list <- list()
for (i in 1:length(x)){
  my_list[[1]] <- df %>% group_by(reportyear) %>% top_n(2, wt = x[1])
}

This returns an error:

 Error in eval(substitute(expr), envir, enclos) : 
  Unsupported use of matrix or array for column indexing

Could you please help with this issue?

Upvotes: 0

Views: 1885

Answers (1)

konvas
konvas

Reputation: 14346

I don't think there is an easy way around this (e.g. by wrapping x[1] inside as.name) unless you want to change the function top_n. The reason like @ulfelder suggested in the comments is that dplyr uses non-standard evaluation, so it expects an unquoted variable name in this case. Other functions have versions to handle quoted arguments (e.g. mutate_, rename_, etc) but not in this case.

The easiest way around it would be to use a temporary assignment , e.g.

df %>% 
    group_by(reportyear) %>% 
    mutate_(tempvar = x[1]) %>% 
    top_n(2, wt = tempvar) %>% 
    select(-tempvar)

(of course you need to ensure tempvar is not a variable name already in your data frame or it will overwrite an existing variable).Far from ideal and you may have thought about this already and rejected it.

Another way is to define your own top_n_ function which is like top_n but expects a string in the wt argument:

top_n_ <- function (x, n, wt) {
    wt <- as.name(wt)
    stopifnot(is.numeric(n), length(n) == 1)
    if (n > 0) {
        call <- substitute(filter(x, min_rank(desc(wt)) <= n),
            list(n = n, wt = wt))
    }
    else {
        call <- substitute(filter(x, min_rank(wt) <= n), list(n = abs(n),
            wt = wt))
    }
    eval(call)
}

This is basically just taking top_n and changing the handling of the wt argument, at the top of the function definition. Then you can do

df %>% group_by(reportyear) %>% top_n_(2, wt = x[1])

identical(
    df %>% group_by(reportyear) %>% top_n_(2, wt = x[1]),
    df %>% group_by(reportyear) %>% top_n(2, wt = b001),
)
# TRUE
identical(
    df %>% group_by(reportyear) %>% top_n_(2, wt = x[2]),
    df %>% group_by(reportyear) %>% top_n(2, wt = b002),
)
# TRUE

Upvotes: 1

Related Questions