Reputation:
I am trying to obtain estimated constrained coefficients using RSS. The beta coefficients are constrained between [0,1] and sum to 1. Additionally, my third parameter is constrained between (-1,1). Utilizing the below I can obtain a nice solution using simulated variables but when implementing the methodology on my real data set I keep arriving at a non-unique solution. In turn, I'm wondering if there is a more numerically stable way to obtain my estimated parameters.
set.seed(234)
k = 2
a = diff(c(0, sort(runif(k-1)), 1))
n = 1e4
x = matrix(rnorm(k*n), nc = k)
a2 = -0.5
y = a2 * (x %*% a) + rnorm(n)
f = function(u){sum((y - u[3] * (x %*% u[1:2]))^2)}
g = function(v){
v1 = v[1]
v2 = v[2]
u = vector(mode = "double", length = 3)
# ensure in (0,1)
v1 = 1 / (1 + exp(-v1))
# ensure add up to 1
u[1:2] = c(v1, 1 - sum(v1))
# ensure between [-1,1]
u[3] = (v2^2 - 1) / (v2^2 + 1)
u
}
res = optim(rnorm(2), function(v) f(g(v)), hessian = TRUE, method = "BFGS")
eigen(res$hessian)$values
res$convergence
rbind(Est = res$par, SE = sqrt(diag(solve(res$hessian))))
rbind(g(res$par),c(a,a2))
Hats off to http://zoonek.free.fr/blosxom/R/2012-06-01_Optimization.html
Upvotes: 1
Views: 1409
Reputation: 1829
The simplest way to solve optimization problems with equality and inequality constraints will most likely be through the "augmented Lagrangian" approach. In R this is, for example, realized in the alabama package.
# function and gradient
fn = function(u){sum((y - u[3] * (x %*% u[1:2]))^2)}
gr = function(u) numDeriv::grad(fn, u)
# constraint sum(u) == 1
heq = function(u) sum(u) - 1
# constraints 0 <= u[1],u[2] <= 1; -1 <= u[3] <= 1
hin = function(u) c(u[1], u[2], 1-u[1], 1-u[2], u[3]+1, 1-u[3])
sol_a = alabama::auglag(c(0.5, 0.5, 0), fn, gr, hin=hin, heq=heq)
sol_a
## $par
## [1] 1.0000000 0.3642904 -0.3642904
## $value
## [1] 10094.74
## ...
## $hessian
## [,1] [,2] [,3]
## [1,] 15009565054 9999999977 9999992926
## [2,] 9999999977 10000002578 9999997167
## [3,] 9999992926 9999997167 10000022569
For other packages containing an "augmented Lagrangian" procedure see the CRAN Task View on optimization.
Upvotes: 1
Reputation: 50738
Since there has been no direct answer to your question so far, I'd like to show a way how to implement a parameter-constrained model in Stan/RStan. You should give this a try using your real data.
Doing Bayesian inference has the advantage of giving you posterior probabilities for your (constrained) model parameters. Point estimates including confidence intervals can then be easily calculated.
First off, we load the library and set RStan to store the compiled model and use multiple cores (if available).
library(rstan);
rstan_options(auto_write = TRUE);
options(mc.cores = parallel::detectCores());
We now define our Stan model. In this case, it's very simple, and we can make use of RStan's simplex
data type for vectors of non-negative values that sum to one.
model <- "
data {
int<lower=1> n; // number of observations
int<lower=0> k; // number of parameters
matrix[n, k] X; // data
vector[n] y; // response
}
parameters {
real a2; // a2 is a free scaling parameter
simplex[k] a; // a is constrained to sum to 1
real sigma; // residuals
}
model {
// Likelihood
y ~ normal(a2 * (X * a), sigma);
}"
Stan supports various constrained data types; I'd recommend taking a lot at the Stan manual for more complex examples.
Using the sample data from your original question, we can run our model:
# Sample data
set.seed(234);
k = 2;
a = diff(c(0, sort(runif(k-1)), 1));
n = 1e4;
x = matrix(rnorm(k * n), nc = k);
a2 = -0.5;
y = a2 * (x %*% a) + rnorm(n);
# Fit stan model
fit <- stan(
model_code = model,
data = list(
n = n,
k = k,
X = x,
y = as.numeric(y)),
iter = 4000,
chains = 4);
Running the model will only take a few seconds (after the parser has internally translated and compiled the model in C++), and the full results (posterior distributions for all parameters conditional on the data) are stored in fit
.
We can inspect the contents of fit
using summary
:
# Extract parameter estimates
pars <- summary(fit)$summary;
pars;
# mean se_mean sd 2.5% 25%
#a2 -0.4915289 1.970327e-04 0.014363398 -0.5194985 -0.5011471
#a[1] 0.7640606 2.273282e-04 0.016348488 0.7327691 0.7527457
#a[2] 0.2359394 2.273282e-04 0.016348488 0.2040952 0.2248482
#sigma 1.0048695 8.746869e-05 0.007048116 0.9909698 1.0001889
#lp__ -5048.4273105 1.881305e-02 1.204892294 -5051.4871931 -5048.9800451
# 50% 75% 97.5% n_eff Rhat
#a2 -0.4916061 -0.4819086 -0.4625947 5314.196 1.0000947
#a[1] 0.7638723 0.7751518 0.7959048 5171.881 0.9997468
#a[2] 0.2361277 0.2472543 0.2672309 5171.881 0.9997468
#sigma 1.0048994 1.0095420 1.0187554 6492.930 0.9998086
#lp__ -5048.1238783 -5047.5409682 -5047.0355381 4101.832 1.0012841
You can see that a[1]+a[2]=1
.
Plotting parameter estimates including confidence intervals is also easy:
plot(fit);
Upvotes: 1