Reputation: 234
I am trying to maximize a function I created for every row in a data frame. It works fine when I do it with apply, but it doesn't work when I go to pApply. I can't understand why.
HERE IS THE FUNCTION (estimates the semivariance):
VB04 <- function(x) {
#the argument is a vector of 2 parameters, the first is x, the second lambda
####I first define the function little f
ff <- function(z) {
ifelse (z <= x[2]*strike, return(x[1]), return(x[1]*(strike - z)/(strike*(1-x[2]))))
}
####I then estimate the expected payoff of the contract
require(pracma)
require(np)
profit <- quadgk(function(y) {
#estimate the density
#Here since I have estimated the weather index overall, I will look at the entire distribution
density.pt <- npudens(bws = npudensbw(dat = as.data.frame(data.stations$weather_ind[
which(data.stations$census_fips == census & data.stations$year < year_ext)]),
ckertype="epanechnikov", ckerorder=4),
tdat = as.data.frame(data.stations$weather_ind[
which(data.stations$year < year_ext)]),
edat = as.data.frame(y))
#return the value of the expected profit
return(ff(y)*density.pt$dens)
}, a = 0, b = strike)
##I now create a function that estimates the max
#I do this county by county, to get the best contract in each case.
#Only the density is estimated in common.
#first element of the max argument
max.arg <- sapply(-data.stations$yield[which(data.stations$census_fips == census
& data.stations$year < year_ext)] -
sapply(data.stations$weather_ind[which(data.stations$census_fips == census
& data.stations$year < year_ext)], ff),
function(x) x + yield_avg + profit[[1]])
#add a second column of zeroes
max.arg <- cbind(max.arg, 0)
#Take the max
max.arg <- apply(max.arg, 1, max)
#Return the final value, the sum of squares
return(sum(max.arg^2))
}
I want to apply it to each row of a data frame. Here are the first rows:
test[1:10,]
census_fips yield_avg strike
1 17143 161.8571 161.8571
2 17201 139.4286 139.4286
3 18003 147.4857 147.4857
4 18103 150.1571 150.1571
5 18105 137.8000 137.8000
6 18157 157.8714 157.8714
7 18163 149.5857 149.5857
8 19013 168.4286 168.4286
9 19033 163.9286 163.9286
10 19045 161.2286 161.2286
The optimization within parApply goes like this:
library(foreach)
library(doParallel)
cl <- makeCluster(3) # My computer has 4 cores
registerDoParallel(cl)
clusterExport(cl=cl, varlist=c("VB04"))
tempres <- parApply(cl=cl, X=test, MARGIN=1, FUN=function(x) {
strike <- x[3] #prepare the parameters
yield_avg <- x[2]
census <- x[1]
require(optimx)
minopt <- optimx(par=c(1,0.5), fn = VB04, lower=c(0,0),
upper=c(Inf,1), method="L-BFGS-B")
return(cbind(minopt$fvalues[[1]],minopt$par[[1]])
})
With optimx I get the error: "Cannot evaluate function at initial parameters" The optimization works fine when done for any row. It also works with apply. When I try with optim instead of optimx, I get a different error: "object 'strike' not found"
I would really appreciate any help. I am not sure if the problem is that the parameters are not passed on (even though they are defined inside parApply), or something else. I can't find how to fix it.
Thanks,
EDIT: Forgot to put the code for calling the clusters and for passing the function to the clusters. I have added it above
Upvotes: 0
Views: 1457
Reputation: 19677
One problem in your code is that variables such as "strike", "yield_avg" and "census" are not in the scope of VB04
, since they are local variables in the worker function. You could fix that by defining VB04
inside that function, also. That will solve the scoping problem, and you also won't have to export VB04
.
Here's a ridiculously stripped down version of your code that demonstrates this:
library(parallel)
cl <- makePSOCKcluster(3)
test <- matrix(1:4, 2)
tempres <- parApply(cl, test, 1, function(x) {
VB04 <- function() {
strike * yield_avg
}
strike <- x[1]
yield_avg <- x[2]
VB04()
})
optim
and optimx
provide a way to pass additional arguments to the function, which may be a better solution.
Upvotes: 1