Reputation: 13
I am relatively new to R, and after spending some time getting familiar with basic concepts, I am now trying to write my first function. I want to use the function to make some simple calculations over a list of dataframes. My data looks like this (I have more than 100 dataframes, so this is simplified):
d1 <- data.frame(bp1=c(1,2,3),bp2=c(4,5,6), lp=c(4,5,6))
d2 <- data.frame(bp1=c(3,2,1),bp2=c(6,5,4), lp=c(2,1,6))
my.list <- list(d1, d2)
What I want to do is to take the 10^-1st column and multiply with the values in the 3rd column. Then I want to aggregate the results based on the 1st column. My function looks like this:
bp_calc <- function(x) {
bp1 <- x[[i]][1]
lp <- x[[i]][3]
10^-lp * lp -> x[[i]]$p_logp
aggregate(x[[i]]$p_logp ~ bp1, data = x, sum) -> result
return(result)
}
To use the function on my data, I use:
lapply(my.list,bp_calc)
However, this is throwing the error: Error in .subset2(x, i, exact = exact) : subscript out of bounds. I have of course tried to google this and seaching in this forum, but I just cannot understand what I am doing wrong. Help would be much appreciated, thanks!
Upvotes: 1
Views: 118
Reputation: 13823
subscript out of bounds
means that you're trying to access a nonexistent list element. For example:
l <- as.list(letters[1:3])
l[4] # returns list(NULL)
l[[4]] # error
So why is this happening? Look carefully at your code. lapply(my.list, bp_calc)
extracts each element of my.list
and passes it to the first argument of bp_calc
. In this case, each list element is a data frame and i
is never defined anywhere in this process.
So R searches for a variable called i
in the environment where bp_calc
was defined. In that case, either it finds i
, or it doesn't and returns an error. Here R is finding i
defined somewhere else, because otherwise it would say object 'i' not found
. And whatever that i
is, it apparently isn't any one of 1
, 2
, 3
, bp1
, bp2
, or lp
.
What you need to do here is to either define i
inside the function, or define it globally (not recommended because that's how bugs like this arise in the first place), or pass it in as an explicit argument (recommended):
bp_calc <- function(x, i) {
# stuff
}
lapply(my.list, bp_calc, i = something)
And what is R trying to do with i
? It's trying to access the element i
of x
, and then access element 1
or 3
of x[[i]]
. Remember, x
is one data frame, not a list of data frames, because lapply
breaks apart my.list
before bp_calc
is called. It seems like you were thinking x[[i]]
would access the "current" list element, but in reality x
itself is the current list element, so x[[i]]
is actually the "i-th element of the current element of my.list
." So x[[i]][3]
is "the third element of the ith element of the current element of my.list
".
What you want is this:
bp_calc <- function(x) {
bp1 <- x[[1]]
lp <- x[[3]]
10^-lp * lp -> x$p_logp
aggregate(x$p_logp ~ bp1, data = x, sum)
# by the way, R functions automatically return the last evaluated expression
}
Upvotes: 0
Reputation: 886938
You could use transform
to create the new variable p_logp
and use it as the data
in aggregate
bp_calc <- function(x) {
aggregate(p_logp~bp1, transform(x, p_logp=10^-lp*lp), sum)
}
lapply(my.list, bp_calc)
Upvotes: 1