lpoulsen
lpoulsen

Reputation: 13

R: error when applying a function to a list of dataframes

I am relatively new to R, and after spending some time getting familiar with basic concepts, I am now trying to write my first function. I want to use the function to make some simple calculations over a list of dataframes. My data looks like this (I have more than 100 dataframes, so this is simplified):

d1 <- data.frame(bp1=c(1,2,3),bp2=c(4,5,6), lp=c(4,5,6))
d2 <- data.frame(bp1=c(3,2,1),bp2=c(6,5,4), lp=c(2,1,6))
my.list <- list(d1, d2)

What I want to do is to take the 10^-1st column and multiply with the values in the 3rd column. Then I want to aggregate the results based on the 1st column. My function looks like this:

bp_calc <- function(x) {
bp1 <- x[[i]][1] 
lp <- x[[i]][3]
10^-lp * lp -> x[[i]]$p_logp
aggregate(x[[i]]$p_logp ~ bp1, data = x, sum) -> result
return(result)  
}

To use the function on my data, I use:

lapply(my.list,bp_calc)

However, this is throwing the error: Error in .subset2(x, i, exact = exact) : subscript out of bounds. I have of course tried to google this and seaching in this forum, but I just cannot understand what I am doing wrong. Help would be much appreciated, thanks!

Upvotes: 1

Views: 118

Answers (2)

shadowtalker
shadowtalker

Reputation: 13823

subscript out of bounds means that you're trying to access a nonexistent list element. For example:

l <- as.list(letters[1:3])
l[4]  # returns list(NULL)
l[[4]]  # error

So why is this happening? Look carefully at your code. lapply(my.list, bp_calc) extracts each element of my.list and passes it to the first argument of bp_calc. In this case, each list element is a data frame and i is never defined anywhere in this process.

So R searches for a variable called i in the environment where bp_calc was defined. In that case, either it finds i, or it doesn't and returns an error. Here R is finding i defined somewhere else, because otherwise it would say object 'i' not found. And whatever that i is, it apparently isn't any one of 1, 2, 3, bp1, bp2, or lp.

What you need to do here is to either define i inside the function, or define it globally (not recommended because that's how bugs like this arise in the first place), or pass it in as an explicit argument (recommended):

bp_calc <- function(x, i) {
    # stuff
}
lapply(my.list, bp_calc, i = something)

And what is R trying to do with i? It's trying to access the element i of x, and then access element 1 or 3 of x[[i]]. Remember, x is one data frame, not a list of data frames, because lapply breaks apart my.list before bp_calc is called. It seems like you were thinking x[[i]] would access the "current" list element, but in reality x itself is the current list element, so x[[i]] is actually the "i-th element of the current element of my.list." So x[[i]][3] is "the third element of the ith element of the current element of my.list".

What you want is this:

bp_calc <- function(x) {
    bp1 <- x[[1]]
    lp <- x[[3]]
    10^-lp * lp -> x$p_logp
    aggregate(x$p_logp ~ bp1, data = x, sum)
    # by the way, R functions automatically return the last evaluated expression
}

Upvotes: 0

akrun
akrun

Reputation: 886938

You could use transform to create the new variable p_logp and use it as the data in aggregate

bp_calc <- function(x) {
  aggregate(p_logp~bp1, transform(x, p_logp=10^-lp*lp), sum)
 }

lapply(my.list, bp_calc)

Upvotes: 1

Related Questions