user5087936
user5087936

Reputation:

Regression with equality and inequality constrained coefficients in R

I am trying to obtain estimated constrained coefficients using RSS. The beta coefficients are constrained between [0,1] and sum to 1. Additionally, my third parameter is constrained between (-1,1). Utilizing the below I can obtain a nice solution using simulated variables but when implementing the methodology on my real data set I keep arriving at a non-unique solution. In turn, I'm wondering if there is a more numerically stable way to obtain my estimated parameters.

set.seed(234)
k = 2
a = diff(c(0, sort(runif(k-1)), 1))
n = 1e4
x = matrix(rnorm(k*n), nc = k)
a2 = -0.5
y = a2 * (x %*% a) + rnorm(n)
f = function(u){sum((y - u[3] * (x %*% u[1:2]))^2)}
g = function(v){

v1 = v[1]
v2 = v[2]
u = vector(mode = "double", length = 3)

# ensure in (0,1)
v1 = 1 / (1 + exp(-v1))

# ensure add up to 1
u[1:2] = c(v1, 1 - sum(v1))

# ensure between [-1,1]
u[3] = (v2^2 - 1) / (v2^2 + 1)
u
}

res = optim(rnorm(2), function(v) f(g(v)), hessian = TRUE, method = "BFGS")
eigen(res$hessian)$values
res$convergence
rbind(Est = res$par, SE = sqrt(diag(solve(res$hessian))))
rbind(g(res$par),c(a,a2))

Hats off to http://zoonek.free.fr/blosxom/R/2012-06-01_Optimization.html

Upvotes: 1

Views: 1409

Answers (2)

Hans W.
Hans W.

Reputation: 1829

The simplest way to solve optimization problems with equality and inequality constraints will most likely be through the "augmented Lagrangian" approach. In R this is, for example, realized in the alabama package.

# function and gradient
fn = function(u){sum((y - u[3] * (x %*% u[1:2]))^2)}
gr = function(u) numDeriv::grad(fn, u)

# constraint sum(u) == 1
heq = function(u) sum(u) - 1
# constraints 0 <= u[1],u[2] <= 1; -1 <= u[3] <= 1
hin = function(u) c(u[1], u[2], 1-u[1], 1-u[2], u[3]+1, 1-u[3])

sol_a = alabama::auglag(c(0.5, 0.5, 0), fn, gr, hin=hin, heq=heq)
sol_a
## $par
## [1]  1.0000000  0.3642904 -0.3642904
## $value
## [1] 10094.74
## ...
## $hessian
##             [,1]        [,2]        [,3]
## [1,] 15009565054  9999999977  9999992926
## [2,]  9999999977 10000002578  9999997167
## [3,]  9999992926  9999997167 10000022569

For other packages containing an "augmented Lagrangian" procedure see the CRAN Task View on optimization.

Upvotes: 1

Maurits Evers
Maurits Evers

Reputation: 50738

Since there has been no direct answer to your question so far, I'd like to show a way how to implement a parameter-constrained model in Stan/RStan. You should give this a try using your real data.

Doing Bayesian inference has the advantage of giving you posterior probabilities for your (constrained) model parameters. Point estimates including confidence intervals can then be easily calculated.

  1. First off, we load the library and set RStan to store the compiled model and use multiple cores (if available).

    library(rstan);
    rstan_options(auto_write = TRUE);
    options(mc.cores = parallel::detectCores());
    
  2. We now define our Stan model. In this case, it's very simple, and we can make use of RStan's simplex data type for vectors of non-negative values that sum to one.

    model <- "
    data {
        int<lower=1> n;   // number of observations
        int<lower=0> k;   // number of parameters
        matrix[n, k] X;   // data
        vector[n] y;      // response
    }
    
    parameters {
        real a2;          // a2 is a free scaling parameter
        simplex[k] a;     // a is constrained to sum to 1
        real sigma;       // residuals
    }
    
    model {
        // Likelihood
        y ~ normal(a2 * (X * a), sigma);
    }"
    

    Stan supports various constrained data types; I'd recommend taking a lot at the Stan manual for more complex examples.

  3. Using the sample data from your original question, we can run our model:

    # Sample data
    set.seed(234);
    k = 2;
    a = diff(c(0, sort(runif(k-1)), 1));
    n = 1e4;
    x = matrix(rnorm(k * n), nc = k);
    a2 = -0.5;
    y = a2 * (x %*% a) + rnorm(n);
    
    # Fit stan model
    fit <- stan(
        model_code = model,
        data = list(
            n = n,
            k = k,
            X = x,
            y = as.numeric(y)),
        iter = 4000,
        chains = 4);
    

    Running the model will only take a few seconds (after the parser has internally translated and compiled the model in C++), and the full results (posterior distributions for all parameters conditional on the data) are stored in fit.

  4. We can inspect the contents of fit using summary:

    # Extract parameter estimates
    pars <- summary(fit)$summary;
    pars;
    #               mean      se_mean          sd          2.5%           25%
    #a2       -0.4915289 1.970327e-04 0.014363398    -0.5194985    -0.5011471
    #a[1]      0.7640606 2.273282e-04 0.016348488     0.7327691     0.7527457
    #a[2]      0.2359394 2.273282e-04 0.016348488     0.2040952     0.2248482
    #sigma     1.0048695 8.746869e-05 0.007048116     0.9909698     1.0001889
    #lp__  -5048.4273105 1.881305e-02 1.204892294 -5051.4871931 -5048.9800451
    #                50%           75%         97.5%    n_eff      Rhat
    #a2       -0.4916061    -0.4819086    -0.4625947 5314.196 1.0000947
    #a[1]      0.7638723     0.7751518     0.7959048 5171.881 0.9997468
    #a[2]      0.2361277     0.2472543     0.2672309 5171.881 0.9997468
    #sigma     1.0048994     1.0095420     1.0187554 6492.930 0.9998086
    #lp__  -5048.1238783 -5047.5409682 -5047.0355381 4101.832 1.0012841
    

    You can see that a[1]+a[2]=1.

    Plotting parameter estimates including confidence intervals is also easy:

    plot(fit);
    

enter image description here

Upvotes: 1

Related Questions