Chris437
Chris437

Reputation: 69

Estimating Parameters in R from a Weighted Sum of Lagged Variables

I am afraid I am stuck with an estimation problem.

I have two variables, X and Y. Y is explained by the weighted sum of n lagged values of X. My aim is to estimate the two parameters c(alpha0,alpha1) in:

Yt = Sum from j=1 to n of ( ( alpha0 + alpha1 * j ) * Xt-j )

enter image description here

Where Xt-j denotes the jth lag of X.

I came up with this approach because I thought it would be a good idea to estimate the slope of the weights rather than estimating one parameter for each lag of X added (I intent to set n very large).

To the model noise ut is added, which is assumed to be normally distributed with mean zero and standard deviation sigma.

Assuming I would like to set n=510, then I need the original series and 510 lagged series. In order to avoid any NAs in the series I transform the original data to "data_chopped", containing only the observations after the first 510 observations have been dropped, and the matrix "data_lagged" in which each column represents a lagged series:

library(stats)
data<-arima.sim(n=10000,list(ar=0.15,ma=0.1),mean=0.5)

data_chopped<-data[511:length(data)]

data_lagged<-matrix(nrow=length(data_chopped),ncol=510)
for (i in 1:510){
data_lagged[,i]<-head(data,-i)[(511-i):length(head(data,-i))]
}

#Check result:
cbind(data_chopped,data_lagged[,1:3])
#data_lagged[,1] is the first lag of the original data, data_lagged[,2] is the second lag, and so on. No NAs whatsoever to deal with 

To demonstrate the "working order" of my log-likelihood function and the generated series I first would like to fit an AR(3) model:

logl<-function(sigma,alpha,beta,gamma){
-sum(log((1/(sqrt(2*pi)*sigma)) * exp(-((
data_chopped
-alpha*data_lagged[,1]
-beta*data_lagged[,2]
-gamma*data_lagged[,3]
)^2)/(2*sigma^2))))
}

library(stats4)
mle(logl,start=list(sigma=1,alpha=0,beta=0,gamma=0),method="L-BFGS-B")

When I now try to estimate my model in the same way it just does not work. I never got a loop within a log-likelihood function to work, this is why I just wrote out the above model. So,

Yt = Sum from j=1 to n of ( ( alpha0 + alpha1 * j ) * Xt-j )

= (alpha+beta*1)*Xt-1 + (alpha+beta*2)*Xt-2 + (alpha+beta*3)*Xt-3 + ... + (alpha+beta*510)*Xt-510

logl<-function(sigma,alpha,beta){
-sum(log((1/(sqrt(2*pi)*sigma)) * exp(-((
data_chopped
-(alpha + beta*1)*data_lagged[,1]
-(alpha + beta*2)*data_lagged[,2]
-(alpha + beta*3)*data_lagged[,3]
-(alpha + beta*4)*data_lagged[,4]
-(alpha + beta*5)*data_lagged[,5]
...
-(alpha + beta*510)*data_lagged[,510]
)^2)/(2*sigma^2))))
}

library(stats4)
mle(logl,start=list(sigma=1,alpha=0.5,beta=0),method="L-BFGS-B")
Error in optim(start, f, method = method, hessian = TRUE, ...) : 
L-BFGS-B needs finite values of 'fn'

I do not get the error if I try just a very few lines:

logl<-function(sigma,alpha,beta){
-sum(log((1/(sqrt(2*pi)*sigma)) * exp(-((
data_chopped
-(alpha + beta*1)*data_lagged[,1]
-(alpha + beta*2)*data_lagged[,2]
-(alpha + beta*3)*data_lagged[,3]
-(alpha + beta*4)*data_lagged[,4]
-(alpha + beta*5)*data_lagged[,5]
)^2)/(2*sigma^2))))
}

library(stats4)
mle(logl,start=list(sigma=1,alpha=0.5,beta=0),method="L-BFGS-B")
Call:
mle(minuslogl = logl, start = list(sigma = 1, alpha = 0.5, beta = 0), 
method = "L-BFGS-B")

Coefficients:
 sigma      alpha       beta 
1.07797708  0.26178848 -0.04378526 

Can someone please help me out on this?

Upvotes: 0

Views: 607

Answers (1)

IRTFM
IRTFM

Reputation: 263362

I will echo the advice not to use the lag function. Its authors and early users may know what it does but the rest of us have had bad experience with it not delivering to expectations. I find the embed function useful for what I thought the lag function ought to do.

> embed(1:8, 3)
     [,1] [,2] [,3]
[1,]    3    2    1
[2,]    4    3    2
[3,]    5    4    3
[4,]    6    5    4
[5,]    7    6    5
[6,]    8    7    6

Let's say you want to look back at 6 times before the current time and do calculations by row. You need to accept and plan for the fact that it is now ambiguous what should be done with periods 1-6 since they will have incomplete data. I cannot figure out from your formula how one can estimate only two parameters when you have more than two lag periods, unless you apply some specific shape to the wear-off phenomenon .... linear perhaps ... you didn't say.

dfrm <- data.frame(y=rnorm(20), x=rnorm(20) )
dfrm$embx<- matrix(NA, ncol=7, nrow=20)
dfrm$embx[7:20, ] <- embed(dfrm$x, 7) * rep( (7:1)/7, each=14)
lm( y[7:20] ~ embx[7:20,], data=dfrm )

Call:
lm(formula = y[7:20] ~ embx[7:20, ], data = dfrm)

Coefficients:
  (Intercept)  embx[7:20, ]1  embx[7:20, ]2  embx[7:20, ]3  embx[7:20, ]4  embx[7:20, ]5  
       0.3065        -0.2371         0.9504         0.8601         0.5484         0.6621  
embx[7:20, ]6  embx[7:20, ]7  
       1.1619         4.8338  

This uses "full-strength" x_t and factors down to 1/7th strength for x_(t-7). That's a bit different than what your formula expressed since it did not have an x_t covariate but you should be able to construct a "slope" from the estimated coefficients.

Upvotes: 1

Related Questions